././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.1194756 gallery_dl-1.30.2/0000755000175000017500000000000015041463232012407 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753638551.0 gallery_dl-1.30.2/CHANGELOG.md0000644000175000017500000000203415041463227014223 0ustar00mikemike## 1.30.2 - 2025-07-27 ### Extractors #### Additions - [itaku] add `posts` & `bookmarks` extractors ([#7707](https://github.com/mikf/gallery-dl/issues/7707)) #### Fixes - [kemono] support new `kemono.cr` domain ([#7902](https://github.com/mikf/gallery-dl/issues/7902) [#7909](https://github.com/mikf/gallery-dl/issues/7909) [#7911](https://github.com/mikf/gallery-dl/issues/7911) [#7913](https://github.com/mikf/gallery-dl/issues/7913) [#7904](https://github.com/mikf/gallery-dl/issues/7904)) - [coomer] support new `coomer.st` domain ([#7907](https://github.com/mikf/gallery-dl/issues/7907) [#7909](https://github.com/mikf/gallery-dl/issues/7909) [#7911](https://github.com/mikf/gallery-dl/issues/7911) [#7904](https://github.com/mikf/gallery-dl/issues/7904)) ### Post Processors - [exec] use `False` as `start_new_session` default to avoid a `TypeError` ([#7899](https://github.com/mikf/gallery-dl/issues/7899)) ### Miscellaneous - [tests/postprocessor] fix `TypeError` when logging an error ([#6582](https://github.com/mikf/gallery-dl/issues/6582)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.30.2/LICENSE0000644000175000017500000004325414772755651013447 0ustar00mikemike GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. Copyright (C) This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. , 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/MANIFEST.in0000644000175000017500000000013315040344700014137 0ustar00mikemikeinclude README.rst CHANGELOG.md LICENSE scripts/run_tests.py recursive-include docs *.conf ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.1191192 gallery_dl-1.30.2/PKG-INFO0000644000175000017500000003743515041463232013520 0ustar00mikemikeMetadata-Version: 2.4 Name: gallery_dl Version: 1.30.2 Summary: Command-line program to download image galleries and collections from several image hosting sites Home-page: https://github.com/mikf/gallery-dl Download-URL: https://github.com/mikf/gallery-dl/releases/latest Author: Mike Fährmann Author-email: mike_faehrmann@web.de Maintainer: Mike Fährmann Maintainer-email: mike_faehrmann@web.de License: GPLv2 Keywords: image gallery downloader crawler scraper Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: End Users/Desktop Classifier: License :: OSI Approved :: GNU General Public License v2 (GPLv2) Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3 :: Only Classifier: Programming Language :: Python :: 3.8 Classifier: Programming Language :: Python :: 3.9 Classifier: Programming Language :: Python :: 3.10 Classifier: Programming Language :: Python :: 3.11 Classifier: Programming Language :: Python :: 3.12 Classifier: Programming Language :: Python :: 3.13 Classifier: Programming Language :: Python :: Implementation :: CPython Classifier: Programming Language :: Python :: Implementation :: PyPy Classifier: Topic :: Internet :: WWW/HTTP Classifier: Topic :: Multimedia :: Graphics Classifier: Topic :: Utilities Requires-Python: >=3.8 License-File: LICENSE Requires-Dist: requests>=2.11.0 Provides-Extra: video Requires-Dist: yt-dlp; extra == "video" Provides-Extra: extra Requires-Dist: requests[socks]; extra == "extra" Requires-Dist: yt-dlp[default]; extra == "extra" Requires-Dist: pyyaml; extra == "extra" Requires-Dist: toml; python_version < "3.11" and extra == "extra" Requires-Dist: truststore; python_version >= "3.10" and extra == "extra" Requires-Dist: secretstorage; sys_platform == "linux" and extra == "extra" Dynamic: author Dynamic: author-email Dynamic: classifier Dynamic: description Dynamic: download-url Dynamic: home-page Dynamic: keywords Dynamic: license Dynamic: license-file Dynamic: maintainer Dynamic: maintainer-email Dynamic: provides-extra Dynamic: requires-dist Dynamic: requires-python Dynamic: summary ========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.8+ - Requests_ Optional -------- - yt-dlp_ or youtube-dl_: HLS/DASH video downloads, ``ytdl`` integration - FFmpeg_: Pixiv Ugoira conversion - mkvmerge_: Accurate Ugoira frame timecodes - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support - zstandard_: Zstandard compression support - PyYAML_: YAML configuration file support - toml_: TOML configuration file support for Python<3.11 - SecretStorage_: GNOME keyring passwords for ``--cookies-from-browser`` - Psycopg_: PostgreSQL archive support - truststore_: Native system certificate support - Jinja_: Jinja template support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U --force-reinstall --no-deps https://github.com/mikf/gallery-dl/archive/master.tar.gz Omit :code:`--no-deps` if Requests_ hasn't been installed yet. Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/gdl-org/builds/releases Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl MacPorts -------- For macOS users with MacPorts: .. code:: bash sudo port install gallery-dl Docker -------- Using the Dockerfile in the repository: .. code:: bash git clone https://github.com/mikf/gallery-dl.git cd gallery-dl/ docker build -t gallery-dl:latest . Pulling image from `Docker Hub `__: .. code:: bash docker pull mikf123/gallery-dl docker tag mikf123/gallery-dl gallery-dl Pulling image from `GitHub Container Registry `__: .. code:: bash docker pull ghcr.io/mikf/gallery-dl docker tag ghcr.io/mikf/gallery-dl gallery-dl To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs. Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there. If you gave the container a different tag or are using podman then make sure you adjust. Run ``docker image ls`` to check the name if you are not sure. This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a ``--pull=newer`` flag so that when you run it docker will check to see if there is a newer container and download it before running. .. code:: bash docker run --rm -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command. Nix and Home Manager -------------------------- Adding *gallery-dl* to your system environment: .. code:: nix environment.systemPackages = with pkgs; [ gallery-dl ]; Using :code:`nix-shell` .. code:: bash nix-shell -p gallery-dl .. code:: bash nix-shell -p gallery-dl --run "gallery-dl " For Home Manager users, you can manage *gallery-dl* declaratively: .. code:: nix programs.gallery-dl = { enable = true; settings = { extractor.base-directory = "~/Downloads"; }; }; Alternatively, you can just add it to :code:`home.packages` if you don't want to manage it declaratively: .. code:: nix home.packages = with pkgs; [ gallery-dl ]; After making these changes, simply rebuild your configuration and open a new shell to have *gallery-dl* available. Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found at ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt LOCALLY `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/latest/ .. _FFmpeg: https://www.ffmpeg.org/ .. _mkvmerge: https://www.matroska.org/downloads/mkvtoolnix.html .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _zstandard: https://github.com/indygreg/python-zstandard .. _PyYAML: https://pyyaml.org/ .. _toml: https://pypi.org/project/toml/ .. _SecretStorage: https://pypi.org/project/SecretStorage/ .. _Psycopg: https://www.psycopg.org/ .. _truststore: https://truststore.readthedocs.io/en/latest/ .. _Jinja: https://jinja.palletsprojects.com/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh/ .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753638551.0 gallery_dl-1.30.2/README.rst0000644000175000017500000003260715041463227014112 0ustar00mikemike========== gallery-dl ========== *gallery-dl* is a command-line program to download image galleries and collections from several image hosting sites (see `Supported Sites `__). It is a cross-platform tool with many `configuration options `__ and powerful `filenaming capabilities `__. |pypi| |build| .. contents:: Dependencies ============ - Python_ 3.8+ - Requests_ Optional -------- - yt-dlp_ or youtube-dl_: HLS/DASH video downloads, ``ytdl`` integration - FFmpeg_: Pixiv Ugoira conversion - mkvmerge_: Accurate Ugoira frame timecodes - PySocks_: SOCKS proxy support - brotli_ or brotlicffi_: Brotli compression support - zstandard_: Zstandard compression support - PyYAML_: YAML configuration file support - toml_: TOML configuration file support for Python<3.11 - SecretStorage_: GNOME keyring passwords for ``--cookies-from-browser`` - Psycopg_: PostgreSQL archive support - truststore_: Native system certificate support - Jinja_: Jinja template support Installation ============ Pip --- The stable releases of *gallery-dl* are distributed on PyPI_ and can be easily installed or upgraded using pip_: .. code:: bash python3 -m pip install -U gallery-dl Installing the latest dev version directly from GitHub can be done with pip_ as well: .. code:: bash python3 -m pip install -U --force-reinstall --no-deps https://github.com/mikf/gallery-dl/archive/master.tar.gz Omit :code:`--no-deps` if Requests_ hasn't been installed yet. Note: Windows users should use :code:`py -3` instead of :code:`python3`. It is advised to use the latest version of pip_, including the essential packages :code:`setuptools` and :code:`wheel`. To ensure these packages are up-to-date, run .. code:: bash python3 -m pip install --upgrade pip setuptools wheel Standalone Executable --------------------- Prebuilt executable files with a Python interpreter and required Python packages included are available for - `Windows `__ (Requires `Microsoft Visual C++ Redistributable Package (x86) `__) - `Linux `__ Nightly Builds -------------- | Executables build from the latest commit can be found at | https://github.com/gdl-org/builds/releases Snap ---- Linux users that are using a distro that is supported by Snapd_ can install *gallery-dl* from the Snap Store: .. code:: bash snap install gallery-dl Chocolatey ---------- Windows users that have Chocolatey_ installed can install *gallery-dl* from the Chocolatey Community Packages repository: .. code:: powershell choco install gallery-dl Scoop ----- *gallery-dl* is also available in the Scoop_ "main" bucket for Windows users: .. code:: powershell scoop install gallery-dl Homebrew -------- For macOS or Linux users using Homebrew: .. code:: bash brew install gallery-dl MacPorts -------- For macOS users with MacPorts: .. code:: bash sudo port install gallery-dl Docker -------- Using the Dockerfile in the repository: .. code:: bash git clone https://github.com/mikf/gallery-dl.git cd gallery-dl/ docker build -t gallery-dl:latest . Pulling image from `Docker Hub `__: .. code:: bash docker pull mikf123/gallery-dl docker tag mikf123/gallery-dl gallery-dl Pulling image from `GitHub Container Registry `__: .. code:: bash docker pull ghcr.io/mikf/gallery-dl docker tag ghcr.io/mikf/gallery-dl gallery-dl To run the container you will probably want to attach some directories on the host so that the config file and downloads can persist across runs. Make sure to either download the example config file reference in the repo and place it in the mounted volume location or touch an empty file there. If you gave the container a different tag or are using podman then make sure you adjust. Run ``docker image ls`` to check the name if you are not sure. This will remove the container after every use so you will always have a fresh environment for it to run. If you setup a ci-cd pipeline to autobuild the container you can also add a ``--pull=newer`` flag so that when you run it docker will check to see if there is a newer container and download it before running. .. code:: bash docker run --rm -v $HOME/Downloads/:/gallery-dl/ -v $HOME/.config/gallery-dl/gallery-dl.conf:/etc/gallery-dl.conf -it gallery-dl:latest You can also add an alias to your shell for "gallery-dl" or create a simple bash script and drop it somewhere in your $PATH to act as a shim for this command. Nix and Home Manager -------------------------- Adding *gallery-dl* to your system environment: .. code:: nix environment.systemPackages = with pkgs; [ gallery-dl ]; Using :code:`nix-shell` .. code:: bash nix-shell -p gallery-dl .. code:: bash nix-shell -p gallery-dl --run "gallery-dl " For Home Manager users, you can manage *gallery-dl* declaratively: .. code:: nix programs.gallery-dl = { enable = true; settings = { extractor.base-directory = "~/Downloads"; }; }; Alternatively, you can just add it to :code:`home.packages` if you don't want to manage it declaratively: .. code:: nix home.packages = with pkgs; [ gallery-dl ]; After making these changes, simply rebuild your configuration and open a new shell to have *gallery-dl* available. Usage ===== To use *gallery-dl* simply call it with the URLs you wish to download images from: .. code:: bash gallery-dl [OPTIONS]... URLS... Use :code:`gallery-dl --help` or see ``__ for a full list of all command-line options. Examples -------- Download images; in this case from danbooru via tag search for 'bonocho': .. code:: bash gallery-dl "https://danbooru.donmai.us/posts?tags=bonocho" Get the direct URL of an image from a site supporting authentication with username & password: .. code:: bash gallery-dl -g -u "" -p "" "https://twitter.com/i/web/status/604341487988576256" Filter manga chapters by chapter number and language: .. code:: bash gallery-dl --chapter-filter "10 <= chapter < 20" -o "lang=fr" "https://mangadex.org/title/59793dd0-a2d8-41a2-9758-8197287a8539" | Search a remote resource for URLs and download images from them: | (URLs for which no extractor can be found will be silently ignored) .. code:: bash gallery-dl "r:https://pastebin.com/raw/FLwrCYsT" If a site's address is nonstandard for its extractor, you can prefix the URL with the extractor's name to force the use of a specific extractor: .. code:: bash gallery-dl "tumblr:https://sometumblrblog.example" Configuration ============= Configuration files for *gallery-dl* use a JSON-based file format. Documentation ------------- A list of all available configuration options and their descriptions can be found at ``__. | For a default configuration file with available options set to their default values, see ``__. | For a commented example with more involved settings and option usage, see ``__. Locations --------- *gallery-dl* searches for configuration files in the following places: Windows: * ``%APPDATA%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl\config.json`` * ``%USERPROFILE%\gallery-dl.conf`` (``%USERPROFILE%`` usually refers to a user's home directory, i.e. ``C:\Users\\``) Linux, macOS, etc.: * ``/etc/gallery-dl.conf`` * ``${XDG_CONFIG_HOME}/gallery-dl/config.json`` * ``${HOME}/.config/gallery-dl/config.json`` * ``${HOME}/.gallery-dl.conf`` When run as `executable `__, *gallery-dl* will also look for a ``gallery-dl.conf`` file in the same directory as said executable. It is possible to use more than one configuration file at a time. In this case, any values from files after the first will get merged into the already loaded settings and potentially override previous ones. Authentication ============== Username & Password ------------------- Some extractors require you to provide valid login credentials in the form of a username & password pair. This is necessary for ``nijie`` and optional for ``aryion``, ``danbooru``, ``e621``, ``exhentai``, ``idolcomplex``, ``imgbb``, ``inkbunny``, ``mangadex``, ``mangoxo``, ``pillowfort``, ``sankaku``, ``subscribestar``, ``tapas``, ``tsumino``, ``twitter``, and ``zerochan``. You can set the necessary information in your `configuration file `__ .. code:: json { "extractor": { "twitter": { "username": "", "password": "" } } } or you can provide them directly via the :code:`-u/--username` and :code:`-p/--password` or via the :code:`-o/--option` command-line options .. code:: bash gallery-dl -u "" -p "" "URL" gallery-dl -o "username=" -o "password=" "URL" Cookies ------- For sites where login with username & password is not possible due to CAPTCHA or similar, or has not been implemented yet, you can use the cookies from a browser login session and input them into *gallery-dl*. This can be done via the `cookies `__ option in your configuration file by specifying - | the path to a Mozilla/Netscape format cookies.txt file exported by a browser addon | (e.g. `Get cookies.txt LOCALLY `__ for Chrome, `Export Cookies `__ for Firefox) - | a list of name-value pairs gathered from your browser's web developer tools | (in `Chrome `__, in `Firefox `__) - | the name of a browser to extract cookies from | (supported browsers are Chromium-based ones, Firefox, and Safari) For example: .. code:: json { "extractor": { "instagram": { "cookies": "$HOME/path/to/cookies.txt" }, "patreon": { "cookies": { "session_id": "K1T57EKu19TR49C51CDjOJoXNQLF7VbdVOiBrC9ye0a" } }, "twitter": { "cookies": ["firefox"] } } } | You can also specify a cookies.txt file with the :code:`--cookies` command-line option | or a browser to extract cookies from with :code:`--cookies-from-browser`: .. code:: bash gallery-dl --cookies "$HOME/path/to/cookies.txt" "URL" gallery-dl --cookies-from-browser firefox "URL" OAuth ----- *gallery-dl* supports user authentication via OAuth_ for some extractors. This is necessary for ``pixiv`` and optional for ``deviantart``, ``flickr``, ``reddit``, ``smugmug``, ``tumblr``, and ``mastodon`` instances. Linking your account to *gallery-dl* grants it the ability to issue requests on your account's behalf and enables it to access resources which would otherwise be unavailable to a public user. To do so, start by invoking it with ``oauth:`` as an argument. For example: .. code:: bash gallery-dl oauth:flickr You will be sent to the site's authorization page and asked to grant read access to *gallery-dl*. Authorize it and you will be shown one or more "tokens", which should be added to your configuration file. To authenticate with a ``mastodon`` instance, run *gallery-dl* with ``oauth:mastodon:`` as argument. For example: .. code:: bash gallery-dl oauth:mastodon:pawoo.net gallery-dl oauth:mastodon:https://mastodon.social/ .. _Python: https://www.python.org/downloads/ .. _PyPI: https://pypi.org/ .. _pip: https://pip.pypa.io/en/stable/ .. _Requests: https://requests.readthedocs.io/en/latest/ .. _FFmpeg: https://www.ffmpeg.org/ .. _mkvmerge: https://www.matroska.org/downloads/mkvtoolnix.html .. _yt-dlp: https://github.com/yt-dlp/yt-dlp .. _youtube-dl: https://ytdl-org.github.io/youtube-dl/ .. _PySocks: https://pypi.org/project/PySocks/ .. _brotli: https://github.com/google/brotli .. _brotlicffi: https://github.com/python-hyper/brotlicffi .. _zstandard: https://github.com/indygreg/python-zstandard .. _PyYAML: https://pyyaml.org/ .. _toml: https://pypi.org/project/toml/ .. _SecretStorage: https://pypi.org/project/SecretStorage/ .. _Psycopg: https://www.psycopg.org/ .. _truststore: https://truststore.readthedocs.io/en/latest/ .. _Jinja: https://jinja.palletsprojects.com/ .. _Snapd: https://docs.snapcraft.io/installing-snapd .. _OAuth: https://en.wikipedia.org/wiki/OAuth .. _Chocolatey: https://chocolatey.org/install .. _Scoop: https://scoop.sh/ .. |pypi| image:: https://img.shields.io/pypi/v/gallery-dl.svg :target: https://pypi.org/project/gallery-dl/ .. |build| image:: https://github.com/mikf/gallery-dl/workflows/tests/badge.svg :target: https://github.com/mikf/gallery-dl/actions .. |gitter| image:: https://badges.gitter.im/gallery-dl/main.svg :target: https://gitter.im/gallery-dl/main ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1753638554.033117 gallery_dl-1.30.2/data/0000755000175000017500000000000015041463232013320 5ustar00mikemike././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.0373418 gallery_dl-1.30.2/data/completion/0000755000175000017500000000000015041463232015471 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753373695.0 gallery_dl-1.30.2/data/completion/_gallery-dl0000644000175000017500000002001515040455777017623 0ustar00mikemike#compdef gallery-dl local curcontext="$curcontext" typeset -A opt_args local rc=1 _arguments -s -S \ {-h,--help}'[Print this help message and exit]' \ --version'[Print program version and exit]' \ {-f,--filename}'[Filename format string for downloaded files ('\''/O'\'' for "original" filenames)]':'' \ {-d,--destination}'[Target location for file downloads]':'' \ {-D,--directory}'[Exact location for file downloads]':'' \ {-X,--extractors}'[Load external extractors from PATH]':'' \ --user-agent'[User-Agent request header]':'' \ --clear-cache'[Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)]':'' \ --compat'[Restore legacy '\''category'\'' names]' \ {-U,--update-check}'[Check if a newer version is available]' \ {-i,--input-file}'[Download URLs found in FILE ('\''-'\'' for stdin). More than one --input-file can be specified]':'':_files \ {-I,--input-file-comment}'[Download URLs found in FILE. Comment them out after they were downloaded successfully.]':'':_files \ {-x,--input-file-delete}'[Download URLs found in FILE. Delete them after they were downloaded successfully.]':'':_files \ --no-input'[Do not prompt for passwords/tokens]' \ {-q,--quiet}'[Activate quiet mode]' \ {-w,--warning}'[Print only warnings and errors]' \ {-v,--verbose}'[Print various debugging information]' \ {-g,--get-urls}'[Print URLs instead of downloading]' \ {-G,--resolve-urls}'[Print URLs instead of downloading; resolve intermediary URLs]' \ {-j,--dump-json}'[Print JSON information]' \ {-J,--resolve-json}'[Print JSON information; resolve intermediary URLs]' \ {-s,--simulate}'[Simulate data extraction; do not download anything]' \ {-E,--extractor-info}'[Print extractor defaults and settings]' \ {-K,--list-keywords}'[Print a list of available keywords and example values for the given URLs]' \ {-e,--error-file}'[Add input URLs which returned an error to FILE]':'':_files \ {-N,--print}'[Write FORMAT during EVENT (default '\''prepare'\'') to standard output instead of downloading files. Can be used multiple times. Examples: '\''id'\'' or '\''post:{md5\[:8\]}'\'']':'<[event:]format>' \ --Print'[Like --print, but downloads files as well]':'<[event:]format>' \ --print-to-file'[Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple times]':'<[event:]format file>' \ --Print-to-file'[Like --print-to-file, but downloads files as well]':'<[event:]format file>' \ --list-modules'[Print a list of available extractor modules]' \ --list-extractors'[Print a list of extractor classes with description, (sub)category and example URL]':'<[categories]>' \ --write-log'[Write logging output to FILE]':'':_files \ --write-unsupported'[Write URLs, which get emitted by other extractors but cannot be handled, to FILE]':'':_files \ --write-pages'[Write downloaded intermediary pages to files in the current directory to debug problems]' \ --print-traffic'[Display sent and read HTTP traffic]' \ --no-colors'[Do not emit ANSI color codes in output]' \ {-R,--retries}'[Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)]':'' \ --http-timeout'[Timeout for HTTP connections (default: 30.0)]':'' \ --proxy'[Use the specified proxy]':'' \ --source-address'[Client-side IP address to bind to]':'' \ {-4,--force-ipv4}'[Make all connections via IPv4]' \ {-6,--force-ipv6}'[Make all connections via IPv6]' \ --no-check-certificate'[Disable HTTPS certificate validation]' \ {-r,--limit-rate}'[Maximum download rate (e.g. 500k, 2.5M, or 800k-2M)]':'' \ --chunk-size'[Size of in-memory data chunks (default: 32k)]':'' \ --sleep'[Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)]':'' \ --sleep-request'[Number of seconds to wait between HTTP requests during data extraction]':'' \ --sleep-429'[Number of seconds to wait when receiving a '\''429 Too Many Requests'\'' response]':'' \ --sleep-extractor'[Number of seconds to wait before starting data extraction for an input URL]':'' \ --no-part'[Do not use .part files]' \ --no-skip'[Do not skip downloads; overwrite existing files]' \ --no-mtime'[Do not set file modification times according to Last-Modified HTTP response headers]' \ --no-download'[Do not download any files]' \ {-o,--option}'[Additional options. Example: -o browser=firefox]':'' \ {-c,--config}'[Additional configuration files]':'':_files \ --config-yaml'[Additional configuration files in YAML format]':'':_files \ --config-toml'[Additional configuration files in TOML format]':'':_files \ --config-create'[Create a basic configuration file]' \ --config-status'[Show configuration file status]' \ --config-open'[Open configuration file in external application]' \ --config-ignore'[Do not read default configuration files]' \ {-u,--username}'[Username to login with]':'' \ {-p,--password}'[Password belonging to the given username]':'' \ --netrc'[Enable .netrc authentication data]' \ {-C,--cookies}'[File to load additional cookies from]':'':_files \ --cookies-export'[Export session cookies to FILE]':'':_files \ --cookies-from-browser'[Name of the browser to load cookies from, with optional domain prefixed with '\''/'\'', keyring name prefixed with '\''+'\'', profile prefixed with '\'':'\'', and container prefixed with '\''::'\'' ('\''none'\'' for no container (default), '\''all'\'' for all containers)]':'' \ {-A,--abort}'[Stop current extractor run after N consecutive file downloads were skipped]':'' \ {-T,--terminate}'[Stop current and parent extractor runs after N consecutive file downloads were skipped]':'' \ --filesize-min'[Do not download files smaller than SIZE (e.g. 500k or 2.5M)]':'' \ --filesize-max'[Do not download files larger than SIZE (e.g. 500k or 2.5M)]':'' \ --download-archive'[Record successfully downloaded files in FILE and skip downloading any file already in it]':'':_files \ --range'[Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '\''5'\'', '\''8-20'\'', or '\''1:24:3'\'')]':'' \ --chapter-range'[Like '\''--range'\'', but applies to manga chapters and other delegated URLs]':'' \ --filter'[Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '\''-K'\''. Example: --filter "image_width >= 1000 and rating in ('\''s'\'', '\''q'\'')"]':'' \ --chapter-filter'[Like '\''--filter'\'', but applies to manga chapters and other delegated URLs]':'' \ {-P,--postprocessor}'[Activate the specified post processor]':'' \ --no-postprocessors'[Do not run any post processors]' \ {-O,--postprocessor-option}'[Additional post processor options]':'' \ --write-metadata'[Write metadata to separate JSON files]' \ --write-info-json'[Write gallery metadata to a info.json file]' \ --write-tags'[Write image tags to separate text files]' \ --zip'[Store downloaded files in a ZIP archive]' \ --cbz'[Store downloaded files in a CBZ archive]' \ --mtime'[Set file modification times according to metadata selected by NAME. Examples: '\''date'\'' or '\''status\[date\]'\'']':'' \ --rename'[Rename previously downloaded files from FORMAT to the current filename format]':'' \ --rename-to'[Rename previously downloaded files from the current filename format to FORMAT]':'' \ --ugoira'[Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are '\''webm'\'', '\''mp4'\'', '\''gif'\'', '\''vp8'\'', '\''vp9'\'', '\''vp9-lossless'\'', '\''copy'\'', '\''zip'\''.]':'' \ --exec'[Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"]':'' \ --exec-after'[Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"]':'' && rc=0 return rc ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753373695.0 gallery_dl-1.30.2/data/completion/gallery-dl0000644000175000017500000000354615040455777017476 0ustar00mikemike_gallery_dl() { local cur prev COMPREPLY=() cur="${COMP_WORDS[COMP_CWORD]}" prev="${COMP_WORDS[COMP_CWORD-1]}" if [[ "${prev}" =~ ^(-i|--input-file|-I|--input-file-comment|-x|--input-file-delete|-e|--error-file|--write-log|--write-unsupported|-c|--config|--config-yaml|--config-toml|-C|--cookies|--cookies-export|--download-archive)$ ]]; then COMPREPLY=( $(compgen -f -- "${cur}") ) elif [[ "${prev}" =~ ^()$ ]]; then COMPREPLY=( $(compgen -d -- "${cur}") ) else COMPREPLY=( $(compgen -W "--help --version --filename --destination --directory --extractors --user-agent --clear-cache --compat --update-check --input-file --input-file-comment --input-file-delete --no-input --quiet --warning --verbose --get-urls --resolve-urls --dump-json --resolve-json --simulate --extractor-info --list-keywords --error-file --print --Print --print-to-file --Print-to-file --list-modules --list-extractors --write-log --write-unsupported --write-pages --print-traffic --no-colors --retries --http-timeout --proxy --source-address --force-ipv4 --force-ipv6 --no-check-certificate --limit-rate --chunk-size --sleep --sleep-request --sleep-429 --sleep-extractor --no-part --no-skip --no-mtime --no-download --option --config --config-yaml --config-toml --config-create --config-status --config-open --config-ignore --ignore-config --username --password --netrc --cookies --cookies-export --cookies-from-browser --abort --terminate --filesize-min --filesize-max --download-archive --range --chapter-range --filter --chapter-filter --postprocessor --no-postprocessors --postprocessor-option --write-metadata --write-info-json --write-infojson --write-tags --zip --cbz --mtime --mtime-from-date --rename --rename-to --ugoira --ugoira-conv --ugoira-conv-lossless --ugoira-conv-copy --exec --exec-after" -- "${cur}") ) fi } complete -F _gallery_dl gallery-dl ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753373695.0 gallery_dl-1.30.2/data/completion/gallery-dl.fish0000644000175000017500000002411615040455777020422 0ustar00mikemikecomplete -c gallery-dl -x complete -c gallery-dl -s 'h' -l 'help' -d 'Print this help message and exit' complete -c gallery-dl -l 'version' -d 'Print program version and exit' complete -c gallery-dl -x -s 'f' -l 'filename' -d 'Filename format string for downloaded files ("/O" for "original" filenames)' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'd' -l 'destination' -d 'Target location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'D' -l 'directory' -d 'Exact location for file downloads' complete -c gallery-dl -x -a '(__fish_complete_directories)' -s 'X' -l 'extractors' -d 'Load external extractors from PATH' complete -c gallery-dl -x -l 'user-agent' -d 'User-Agent request header' complete -c gallery-dl -x -l 'clear-cache' -d 'Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything)' complete -c gallery-dl -l 'compat' -d 'Restore legacy "category" names' complete -c gallery-dl -s 'U' -l 'update-check' -d 'Check if a newer version is available' complete -c gallery-dl -r -F -s 'i' -l 'input-file' -d 'Download URLs found in FILE ("-" for stdin). More than one --input-file can be specified' complete -c gallery-dl -r -F -s 'I' -l 'input-file-comment' -d 'Download URLs found in FILE. Comment them out after they were downloaded successfully.' complete -c gallery-dl -r -F -s 'x' -l 'input-file-delete' -d 'Download URLs found in FILE. Delete them after they were downloaded successfully.' complete -c gallery-dl -l 'no-input' -d 'Do not prompt for passwords/tokens' complete -c gallery-dl -s 'q' -l 'quiet' -d 'Activate quiet mode' complete -c gallery-dl -s 'w' -l 'warning' -d 'Print only warnings and errors' complete -c gallery-dl -s 'v' -l 'verbose' -d 'Print various debugging information' complete -c gallery-dl -s 'g' -l 'get-urls' -d 'Print URLs instead of downloading' complete -c gallery-dl -s 'G' -l 'resolve-urls' -d 'Print URLs instead of downloading; resolve intermediary URLs' complete -c gallery-dl -s 'j' -l 'dump-json' -d 'Print JSON information' complete -c gallery-dl -s 'J' -l 'resolve-json' -d 'Print JSON information; resolve intermediary URLs' complete -c gallery-dl -s 's' -l 'simulate' -d 'Simulate data extraction; do not download anything' complete -c gallery-dl -s 'E' -l 'extractor-info' -d 'Print extractor defaults and settings' complete -c gallery-dl -s 'K' -l 'list-keywords' -d 'Print a list of available keywords and example values for the given URLs' complete -c gallery-dl -r -F -s 'e' -l 'error-file' -d 'Add input URLs which returned an error to FILE' complete -c gallery-dl -x -s 'N' -l 'print' -d 'Write FORMAT during EVENT (default "prepare") to standard output instead of downloading files. Can be used multiple times. Examples: "id" or "post:{md5[:8]}"' complete -c gallery-dl -x -l 'Print' -d 'Like --print, but downloads files as well' complete -c gallery-dl -x -l 'print-to-file' -d 'Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple times' complete -c gallery-dl -x -l 'Print-to-file' -d 'Like --print-to-file, but downloads files as well' complete -c gallery-dl -l 'list-modules' -d 'Print a list of available extractor modules' complete -c gallery-dl -x -l 'list-extractors' -d 'Print a list of extractor classes with description, (sub)category and example URL' complete -c gallery-dl -r -F -l 'write-log' -d 'Write logging output to FILE' complete -c gallery-dl -r -F -l 'write-unsupported' -d 'Write URLs, which get emitted by other extractors but cannot be handled, to FILE' complete -c gallery-dl -l 'write-pages' -d 'Write downloaded intermediary pages to files in the current directory to debug problems' complete -c gallery-dl -l 'print-traffic' -d 'Display sent and read HTTP traffic' complete -c gallery-dl -l 'no-colors' -d 'Do not emit ANSI color codes in output' complete -c gallery-dl -x -s 'R' -l 'retries' -d 'Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4)' complete -c gallery-dl -x -l 'http-timeout' -d 'Timeout for HTTP connections (default: 30.0)' complete -c gallery-dl -x -l 'proxy' -d 'Use the specified proxy' complete -c gallery-dl -x -l 'source-address' -d 'Client-side IP address to bind to' complete -c gallery-dl -s '4' -l 'force-ipv4' -d 'Make all connections via IPv4' complete -c gallery-dl -s '6' -l 'force-ipv6' -d 'Make all connections via IPv6' complete -c gallery-dl -l 'no-check-certificate' -d 'Disable HTTPS certificate validation' complete -c gallery-dl -x -s 'r' -l 'limit-rate' -d 'Maximum download rate (e.g. 500k, 2.5M, or 800k-2M)' complete -c gallery-dl -x -l 'chunk-size' -d 'Size of in-memory data chunks (default: 32k)' complete -c gallery-dl -x -l 'sleep' -d 'Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5)' complete -c gallery-dl -x -l 'sleep-request' -d 'Number of seconds to wait between HTTP requests during data extraction' complete -c gallery-dl -x -l 'sleep-429' -d 'Number of seconds to wait when receiving a "429 Too Many Requests" response' complete -c gallery-dl -x -l 'sleep-extractor' -d 'Number of seconds to wait before starting data extraction for an input URL' complete -c gallery-dl -l 'no-part' -d 'Do not use .part files' complete -c gallery-dl -l 'no-skip' -d 'Do not skip downloads; overwrite existing files' complete -c gallery-dl -l 'no-mtime' -d 'Do not set file modification times according to Last-Modified HTTP response headers' complete -c gallery-dl -l 'no-download' -d 'Do not download any files' complete -c gallery-dl -x -s 'o' -l 'option' -d 'Additional options. Example: -o browser=firefox' complete -c gallery-dl -r -F -s 'c' -l 'config' -d 'Additional configuration files' complete -c gallery-dl -r -F -l 'config-yaml' -d 'Additional configuration files in YAML format' complete -c gallery-dl -r -F -l 'config-toml' -d 'Additional configuration files in TOML format' complete -c gallery-dl -l 'config-create' -d 'Create a basic configuration file' complete -c gallery-dl -l 'config-status' -d 'Show configuration file status' complete -c gallery-dl -l 'config-open' -d 'Open configuration file in external application' complete -c gallery-dl -l 'config-ignore' -d 'Do not read default configuration files' complete -c gallery-dl -l 'ignore-config' -d '==SUPPRESS==' complete -c gallery-dl -x -s 'u' -l 'username' -d 'Username to login with' complete -c gallery-dl -x -s 'p' -l 'password' -d 'Password belonging to the given username' complete -c gallery-dl -l 'netrc' -d 'Enable .netrc authentication data' complete -c gallery-dl -r -F -s 'C' -l 'cookies' -d 'File to load additional cookies from' complete -c gallery-dl -r -F -l 'cookies-export' -d 'Export session cookies to FILE' complete -c gallery-dl -x -l 'cookies-from-browser' -d 'Name of the browser to load cookies from, with optional domain prefixed with "/", keyring name prefixed with "+", profile prefixed with ":", and container prefixed with "::" ("none" for no container (default), "all" for all containers)' complete -c gallery-dl -x -s 'A' -l 'abort' -d 'Stop current extractor run after N consecutive file downloads were skipped' complete -c gallery-dl -x -s 'T' -l 'terminate' -d 'Stop current and parent extractor runs after N consecutive file downloads were skipped' complete -c gallery-dl -x -l 'filesize-min' -d 'Do not download files smaller than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -x -l 'filesize-max' -d 'Do not download files larger than SIZE (e.g. 500k or 2.5M)' complete -c gallery-dl -r -F -l 'download-archive' -d 'Record successfully downloaded files in FILE and skip downloading any file already in it' complete -c gallery-dl -x -l 'range' -d 'Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. "5", "8-20", or "1:24:3")' complete -c gallery-dl -x -l 'chapter-range' -d 'Like "--range", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -l 'filter' -d 'Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by "-K". Example: --filter "image_width >= 1000 and rating in ("s", "q")"' complete -c gallery-dl -x -l 'chapter-filter' -d 'Like "--filter", but applies to manga chapters and other delegated URLs' complete -c gallery-dl -x -s 'P' -l 'postprocessor' -d 'Activate the specified post processor' complete -c gallery-dl -l 'no-postprocessors' -d 'Do not run any post processors' complete -c gallery-dl -x -s 'O' -l 'postprocessor-option' -d 'Additional post processor options' complete -c gallery-dl -l 'write-metadata' -d 'Write metadata to separate JSON files' complete -c gallery-dl -l 'write-info-json' -d 'Write gallery metadata to a info.json file' complete -c gallery-dl -l 'write-infojson' -d '==SUPPRESS==' complete -c gallery-dl -l 'write-tags' -d 'Write image tags to separate text files' complete -c gallery-dl -l 'zip' -d 'Store downloaded files in a ZIP archive' complete -c gallery-dl -l 'cbz' -d 'Store downloaded files in a CBZ archive' complete -c gallery-dl -x -l 'mtime' -d 'Set file modification times according to metadata selected by NAME. Examples: "date" or "status[date]"' complete -c gallery-dl -l 'mtime-from-date' -d '==SUPPRESS==' complete -c gallery-dl -x -l 'rename' -d 'Rename previously downloaded files from FORMAT to the current filename format' complete -c gallery-dl -x -l 'rename-to' -d 'Rename previously downloaded files from the current filename format to FORMAT' complete -c gallery-dl -x -l 'ugoira' -d 'Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are "webm", "mp4", "gif", "vp8", "vp9", "vp9-lossless", "copy", "zip".' complete -c gallery-dl -l 'ugoira-conv' -d '==SUPPRESS==' complete -c gallery-dl -l 'ugoira-conv-lossless' -d '==SUPPRESS==' complete -c gallery-dl -l 'ugoira-conv-copy' -d '==SUPPRESS==' complete -c gallery-dl -x -l 'exec' -d 'Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}"' complete -c gallery-dl -x -l 'exec-after' -d 'Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf"' ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.0381193 gallery_dl-1.30.2/data/man/0000755000175000017500000000000015041463232014073 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753638551.0 gallery_dl-1.30.2/data/man/gallery-dl.10000644000175000017500000002400515041463227016216 0ustar00mikemike.TH "GALLERY-DL" "1" "2025-07-27" "1.30.2" "gallery-dl Manual" .\" disable hyphenation .nh .SH NAME gallery-dl \- download image-galleries and -collections .SH SYNOPSIS .B gallery-dl [OPTION]... URL... .SH DESCRIPTION .B gallery-dl is a command-line program to download image-galleries and -collections from several image hosting sites. It is a cross-platform tool with many configuration options and powerful filenaming capabilities. .SH OPTIONS .TP .B "\-h, \-\-help" Print this help message and exit .TP .B "\-\-version" Print program version and exit .TP .B "\-f, \-\-filename" \f[I]FORMAT\f[] Filename format string for downloaded files ('/O' for "original" filenames) .TP .B "\-d, \-\-destination" \f[I]PATH\f[] Target location for file downloads .TP .B "\-D, \-\-directory" \f[I]PATH\f[] Exact location for file downloads .TP .B "\-X, \-\-extractors" \f[I]PATH\f[] Load external extractors from PATH .TP .B "\-\-user\-agent" \f[I]UA\f[] User-Agent request header .TP .B "\-\-clear\-cache" \f[I]MODULE\f[] Delete cached login sessions, cookies, etc. for MODULE (ALL to delete everything) .TP .B "\-\-compat" Restore legacy 'category' names .TP .B "\-U, \-\-update\-check" Check if a newer version is available .TP .B "\-i, \-\-input\-file" \f[I]FILE\f[] Download URLs found in FILE ('-' for stdin). More than one --input-file can be specified .TP .B "\-I, \-\-input\-file\-comment" \f[I]FILE\f[] Download URLs found in FILE. Comment them out after they were downloaded successfully. .TP .B "\-x, \-\-input\-file\-delete" \f[I]FILE\f[] Download URLs found in FILE. Delete them after they were downloaded successfully. .TP .B "\-\-no\-input" Do not prompt for passwords/tokens .TP .B "\-q, \-\-quiet" Activate quiet mode .TP .B "\-w, \-\-warning" Print only warnings and errors .TP .B "\-v, \-\-verbose" Print various debugging information .TP .B "\-g, \-\-get\-urls" Print URLs instead of downloading .TP .B "\-G, \-\-resolve\-urls" Print URLs instead of downloading; resolve intermediary URLs .TP .B "\-j, \-\-dump\-json" Print JSON information .TP .B "\-J, \-\-resolve\-json" Print JSON information; resolve intermediary URLs .TP .B "\-s, \-\-simulate" Simulate data extraction; do not download anything .TP .B "\-E, \-\-extractor\-info" Print extractor defaults and settings .TP .B "\-K, \-\-list\-keywords" Print a list of available keywords and example values for the given URLs .TP .B "\-e, \-\-error\-file" \f[I]FILE\f[] Add input URLs which returned an error to FILE .TP .B "\-N, \-\-print" \f[I][EVENT:]FORMAT\f[] Write FORMAT during EVENT (default 'prepare') to standard output instead of downloading files. Can be used multiple times. Examples: 'id' or 'post:{md5[:8]}' .TP .B "\-\-Print" \f[I][EVENT:]FORMAT\f[] Like --print, but downloads files as well .TP .B "\-\-print\-to\-file" \f[I][EVENT:]FORMAT FILE\f[] Append FORMAT during EVENT to FILE instead of downloading files. Can be used multiple times .TP .B "\-\-Print\-to\-file" \f[I][EVENT:]FORMAT FILE\f[] Like --print-to-file, but downloads files as well .TP .B "\-\-list\-modules" Print a list of available extractor modules .TP .B "\-\-list\-extractors" \f[I][CATEGORIES]\f[] Print a list of extractor classes with description, (sub)category and example URL .TP .B "\-\-write\-log" \f[I]FILE\f[] Write logging output to FILE .TP .B "\-\-write\-unsupported" \f[I]FILE\f[] Write URLs, which get emitted by other extractors but cannot be handled, to FILE .TP .B "\-\-write\-pages" Write downloaded intermediary pages to files in the current directory to debug problems .TP .B "\-\-print\-traffic" Display sent and read HTTP traffic .TP .B "\-\-no\-colors" Do not emit ANSI color codes in output .TP .B "\-R, \-\-retries" \f[I]N\f[] Maximum number of retries for failed HTTP requests or -1 for infinite retries (default: 4) .TP .B "\-\-http\-timeout" \f[I]SECONDS\f[] Timeout for HTTP connections (default: 30.0) .TP .B "\-\-proxy" \f[I]URL\f[] Use the specified proxy .TP .B "\-\-source\-address" \f[I]IP\f[] Client-side IP address to bind to .TP .B "\-4, \-\-force\-ipv4" Make all connections via IPv4 .TP .B "\-6, \-\-force\-ipv6" Make all connections via IPv6 .TP .B "\-\-no\-check\-certificate" Disable HTTPS certificate validation .TP .B "\-r, \-\-limit\-rate" \f[I]RATE\f[] Maximum download rate (e.g. 500k, 2.5M, or 800k-2M) .TP .B "\-\-chunk\-size" \f[I]SIZE\f[] Size of in-memory data chunks (default: 32k) .TP .B "\-\-sleep" \f[I]SECONDS\f[] Number of seconds to wait before each download. This can be either a constant value or a range (e.g. 2.7 or 2.0-3.5) .TP .B "\-\-sleep\-request" \f[I]SECONDS\f[] Number of seconds to wait between HTTP requests during data extraction .TP .B "\-\-sleep\-429" \f[I]SECONDS\f[] Number of seconds to wait when receiving a '429 Too Many Requests' response .TP .B "\-\-sleep\-extractor" \f[I]SECONDS\f[] Number of seconds to wait before starting data extraction for an input URL .TP .B "\-\-no\-part" Do not use .part files .TP .B "\-\-no\-skip" Do not skip downloads; overwrite existing files .TP .B "\-\-no\-mtime" Do not set file modification times according to Last-Modified HTTP response headers .TP .B "\-\-no\-download" Do not download any files .TP .B "\-o, \-\-option" \f[I]KEY=VALUE\f[] Additional options. Example: -o browser=firefox .TP .B "\-c, \-\-config" \f[I]FILE\f[] Additional configuration files .TP .B "\-\-config\-yaml" \f[I]FILE\f[] Additional configuration files in YAML format .TP .B "\-\-config\-toml" \f[I]FILE\f[] Additional configuration files in TOML format .TP .B "\-\-config\-create" Create a basic configuration file .TP .B "\-\-config\-status" Show configuration file status .TP .B "\-\-config\-open" Open configuration file in external application .TP .B "\-\-config\-ignore" Do not read default configuration files .TP .B "\-u, \-\-username" \f[I]USER\f[] Username to login with .TP .B "\-p, \-\-password" \f[I]PASS\f[] Password belonging to the given username .TP .B "\-\-netrc" Enable .netrc authentication data .TP .B "\-C, \-\-cookies" \f[I]FILE\f[] File to load additional cookies from .TP .B "\-\-cookies\-export" \f[I]FILE\f[] Export session cookies to FILE .TP .B "\-\-cookies\-from\-browser" \f[I]BROWSER[/DOMAIN][+KEYRING][:PROFILE][::CONTAINER]\f[] Name of the browser to load cookies from, with optional domain prefixed with '/', keyring name prefixed with '+', profile prefixed with ':', and container prefixed with '::' ('none' for no container (default), 'all' for all containers) .TP .B "\-A, \-\-abort" \f[I]N\f[] Stop current extractor run after N consecutive file downloads were skipped .TP .B "\-T, \-\-terminate" \f[I]N\f[] Stop current and parent extractor runs after N consecutive file downloads were skipped .TP .B "\-\-filesize\-min" \f[I]SIZE\f[] Do not download files smaller than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-filesize\-max" \f[I]SIZE\f[] Do not download files larger than SIZE (e.g. 500k or 2.5M) .TP .B "\-\-download\-archive" \f[I]FILE\f[] Record successfully downloaded files in FILE and skip downloading any file already in it .TP .B "\-\-range" \f[I]RANGE\f[] Index range(s) specifying which files to download. These can be either a constant value, range, or slice (e.g. '5', '8-20', or '1:24:3') .TP .B "\-\-chapter\-range" \f[I]RANGE\f[] Like '--range', but applies to manga chapters and other delegated URLs .TP .B "\-\-filter" \f[I]EXPR\f[] Python expression controlling which files to download. Files for which the expression evaluates to False are ignored. Available keys are the filename-specific ones listed by '-K'. Example: --filter "image_width >= 1000 and rating in ('s', 'q')" .TP .B "\-\-chapter\-filter" \f[I]EXPR\f[] Like '--filter', but applies to manga chapters and other delegated URLs .TP .B "\-P, \-\-postprocessor" \f[I]NAME\f[] Activate the specified post processor .TP .B "\-\-no\-postprocessors" Do not run any post processors .TP .B "\-O, \-\-postprocessor\-option" \f[I]KEY=VALUE\f[] Additional post processor options .TP .B "\-\-write\-metadata" Write metadata to separate JSON files .TP .B "\-\-write\-info\-json" Write gallery metadata to a info.json file .TP .B "\-\-write\-tags" Write image tags to separate text files .TP .B "\-\-zip" Store downloaded files in a ZIP archive .TP .B "\-\-cbz" Store downloaded files in a CBZ archive .TP .B "\-\-mtime" \f[I]NAME\f[] Set file modification times according to metadata selected by NAME. Examples: 'date' or 'status[date]' .TP .B "\-\-rename" \f[I]FORMAT\f[] Rename previously downloaded files from FORMAT to the current filename format .TP .B "\-\-rename\-to" \f[I]FORMAT\f[] Rename previously downloaded files from the current filename format to FORMAT .TP .B "\-\-ugoira" \f[I]FMT\f[] Convert Pixiv Ugoira to FMT using FFmpeg. Supported formats are 'webm', 'mp4', 'gif', 'vp8', 'vp9', 'vp9-lossless', 'copy', 'zip'. .TP .B "\-\-exec" \f[I]CMD\f[] Execute CMD for each downloaded file. Supported replacement fields are {} or {_path}, {_directory}, {_filename}. Example: --exec "convert {} {}.png && rm {}" .TP .B "\-\-exec\-after" \f[I]CMD\f[] Execute CMD after all files were downloaded. Example: --exec-after "cd {_directory} && convert * ../doc.pdf" .SH EXAMPLES .TP gallery-dl \f[I]URL\f[] Download images from \f[I]URL\f[]. .TP gallery-dl -g -u -p \f[I]URL\f[] Print direct URLs from a site that requires authentication. .TP gallery-dl --filter 'type == "ugoira"' --range '2-4' \f[I]URL\f[] Apply filter and range expressions. This will only download the second, third, and fourth file where its type value is equal to "ugoira". .TP gallery-dl r:\f[I]URL\f[] Scan \f[I]URL\f[] for other URLs and invoke \f[B]gallery-dl\f[] on them. .TP gallery-dl oauth:\f[I]SITE\-NAME\f[] Gain OAuth authentication tokens for .IR deviantart , .IR flickr , .IR reddit , .IR smugmug ", and" .IR tumblr . .SH FILES .TP .I /etc/gallery-dl.conf The system wide configuration file. .TP .I ~/.config/gallery-dl/config.json Per user configuration file. .TP .I ~/.gallery-dl.conf Alternate per user configuration file. .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl.conf (5) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753638551.0 gallery_dl-1.30.2/data/man/gallery-dl.conf.50000644000175000017500000057232215041463227017160 0ustar00mikemike.TH "GALLERY-DL.CONF" "5" "2025-07-27" "1.30.2" "gallery-dl Manual" .\" disable hyphenation .nh .\" disable justification (adjust text to left margin only) .ad l .SH NAME gallery-dl.conf \- gallery-dl configuration file .SH DESCRIPTION gallery-dl will search for configuration files in the following places every time it is started, unless .B --ignore-config is specified: .PP .RS 4 .nf .I /etc/gallery-dl.conf .I $HOME/.config/gallery-dl/config.json .I $HOME/.gallery-dl.conf .fi .RE .PP It is also possible to specify additional configuration files with the .B -c/--config command-line option or to add further option values with .B -o/--option as = pairs, Configuration files are JSON-based and therefore don't allow any ordinary comments, but, since unused keys are simply ignored, it is possible to utilize those as makeshift comments by settings their values to arbitrary strings. .SH EXAMPLE { .RS 4 "base-directory": "/tmp/", .br "extractor": { .RS 4 "pixiv": { .RS 4 "directory": ["Pixiv", "Works", "{user[id]}"], .br "filename": "{id}{num}.{extension}", .br "username": "foo", .br "password": "bar" .RE }, .br "flickr": { .RS 4 "_comment": "OAuth keys for account 'foobar'", .br "access-token": "0123456789-0123456789abcdef", .br "access-token-secret": "fedcba9876543210" .RE } .RE }, .br "downloader": { .RS 4 "retries": 3, .br "timeout": 2.5 .RE } .RE } .SH EXTRACTOR OPTIONS .SS extractor.*.filename .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (condition -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json "{manga}_c{chapter}_{page:>03}.{extension}" .. code:: json { "extension == 'mp4'": "{id}_video.{extension}", "'nature' in title" : "{id}_{title}.{extension}", "" : "{id}_default.{extension}" } .IP "Description:" 4 A \f[I]format string\f[] to build filenames for downloaded files with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the filename format strings to use. These expressions are evaluated in the specified order until one evaluates to \f[I]True\f[]. The available replacement keys depend on the extractor used. A list of keys for a specific one can be acquired by calling *gallery-dl* with the \f[I]-K\f[]/\f[I]--list-keywords\f[] command-line option. For example: .. code:: $ gallery-dl -K http://seiga.nicovideo.jp/seiga/im5977527 Keywords for directory names: category seiga subcategory image Keywords for filenames: category seiga extension None image-id 5977527 subcategory image Note: Even if the value of the \f[I]extension\f[] key is missing or \f[I]None\f[], it will be filled in later when the file download is starting. This key is therefore always available to provide a valid filename extension. .SS extractor.*.directory .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (condition -> \f[I]format strings\f[]) .IP "Example:" 4 .. code:: json ["{category}", "{manga}", "c{chapter} - {title}"] .. code:: json { "'nature' in content": ["Nature Pictures"], "retweet_id != 0" : ["{category}", "{user[name]}", "Retweets"], "" : ["{category}", "{user[name]}"] } .IP "Description:" 4 A list of \f[I]format strings\f[] to build target directory paths with. If this is an \f[I]object\f[], it must contain Python expressions mapping to the list of format strings to use. Each individual string in such a list represents a single path segment, which will be joined together and appended to the \f[I]base-directory\f[] to form the complete target directory path. .SS extractor.*.base-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"./gallery-dl/"\f[] .IP "Description:" 4 Directory path used as base for all download destinations. .SS extractor.*.parent-directory .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use an extractor's current target directory as \f[I]base-directory\f[] for any spawned child extractors. .SS extractor.*.metadata-parent .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 If \f[I]true\f[], overwrite any metadata provided by a child extractor with its parent's. If this is a \f[I]string\f[], add a parent's metadata to its children's .br to a field named after said string. For example with \f[I]"parent-metadata": "_p_"\f[]: .br .. code:: json { "id": "child-id", "_p_": {"id": "parent-id"} } .SS extractor.*.parent-skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Share number of skipped downloads between parent and child extractors. .SS extractor.*.path-restrict .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (character -> replacement character(s)) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 .br * "/!? (){}" .br * {"/": "_", "+": "_+_", "({[": "(", "]})": ")", "a-z": "*"} .IP "Description:" 4 A \f[I]string\f[] of characters to be replaced with the value of .br \f[I]path-replace\f[] or an \f[I]object\f[] mapping invalid/unwanted characters, character sets, .br or character ranges to their replacements for generated path segment names. .br Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]"/"\f[] .br * \f[I]"windows"\f[]: \f[I]"\\\\\\\\|/<>:\\"?*"\f[] .br * \f[I]"ascii"\f[]: \f[I]"^0-9A-Za-z_."\f[] (only ASCII digits, letters, underscores, and dots) .br * \f[I]"ascii+"\f[]: \f[I]"^0-9@-[\\\\]-{ #-)+-.;=!}~"\f[] (all ASCII characters except the ones not allowed by Windows) Implementation Detail: For \f[I]strings\f[] with length >= 2, this option uses a \f[I]Regular Expression Character Set\f[], meaning that: .br * using a caret \f[I]^\f[] as first character inverts the set .br * character ranges are supported (\f[I]0-9a-z\f[]) .br * \f[I]]\f[], \f[I]-\f[], and \f[I]\\\f[] need to be escaped as \f[I]\\\\]\f[], \f[I]\\\\-\f[], and \f[I]\\\\\\\\\f[] respectively to use them as literal characters .SS extractor.*.path-replace .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"_"\f[] .IP "Description:" 4 The replacement character(s) for \f[I]path-restrict\f[] .SS extractor.*.path-remove .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"\\u0000-\\u001f\\u007f"\f[] (ASCII control characters) .IP "Description:" 4 Set of characters to remove from generated path names. Note: In a string with 2 or more characters, \f[I][]^-\\\f[] need to be escaped with backslashes, e.g. \f[I]"\\\\[\\\\]"\f[] .SS extractor.*.path-strip .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Set of characters to remove from the end of generated path segment names using \f[I]str.rstrip()\f[] Special values: .br * \f[I]"auto"\f[]: Use characters from \f[I]"unix"\f[] or \f[I]"windows"\f[] depending on the local operating system .br * \f[I]"unix"\f[]: \f[I]""\f[] .br * \f[I]"windows"\f[]: \f[I]". "\f[] .SS extractor.*.path-extended .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 On Windows, use \f[I]extended-length paths\f[] prefixed with \f[I]\\\\?\\\f[] to work around the 260 characters path length limit. .SS extractor.*.extension-map .IP "Type:" 6 \f[I]object\f[] (extension -> replacement) .IP "Default:" 9 .. code:: json { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" } .IP "Description:" 4 A JSON \f[I]object\f[] mapping filename extensions to their replacements. .SS extractor.*.skip .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the behavior when downloading files that have been downloaded before, i.e. a file with the same filename already exists or its ID is in a \f[I]download archive\f[]. .br * \f[I]true\f[]: Skip downloads .br * \f[I]false\f[]: Overwrite already existing files .br * \f[I]"abort"\f[]: Stop the current extractor run .br * \f[I]"abort:N"\f[]: Skip downloads and stop the current extractor run after \f[I]N\f[] consecutive skips .br * \f[I]"terminate"\f[]: Stop the current extractor run, including parent extractors .br * \f[I]"terminate:N"\f[]: Skip downloads and stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive skips .br * \f[I]"exit"\f[]: Exit the program altogether .br * \f[I]"exit:N"\f[]: Skip downloads and exit the program after \f[I]N\f[] consecutive skips .br * \f[I]"enumerate"\f[]: Add an enumeration index to the beginning of the filename extension (\f[I]file.1.ext\f[], \f[I]file.2.ext\f[], etc.) .SS extractor.*.skip-filter .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Python expression controlling which skipped files to count towards \f[I]"abort"\f[] / \f[I]"terminate"\f[] / \f[I]"exit"\f[]. .SS extractor.*.sleep .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before each download. .SS extractor.*.sleep-extractor .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Number of seconds to sleep before handling an input URL, i.e. before starting a new extractor. .SS extractor.*.sleep-429 .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]60\f[] .IP "Description:" 4 Number of seconds to sleep when receiving a 429 Too Many Requests response before \f[I]retrying\f[] the request. .SS extractor.*.sleep-request .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 .br * \f[I]"0.5-1.5"\f[] \f[I]ao3\f[], \f[I]arcalive\f[], \f[I]civitai\f[], \f[I][Danbooru]\f[], \f[I][E621]\f[], \f[I][foolfuuka]:search\f[], \f[I]itaku\f[], \f[I]newgrounds\f[], \f[I][philomena]\f[], \f[I]pixiv-novel\f[], \f[I]plurk\f[], \f[I]poipiku\f[] , \f[I]pornpics\f[], \f[I]schalenetwork\f[], \f[I]scrolller\f[], \f[I]soundgasm\f[], \f[I]urlgalleries\f[], \f[I]vk\f[], \f[I]webtoons\f[], \f[I]weebcentral\f[], \f[I]xfolio\f[], \f[I]zerochan\f[] .br * \f[I]"1.0"\f[] \f[I]furaffinity\f[] .br * \f[I]"1.0-2.0"\f[] \f[I]flickr\f[], \f[I]pexels\f[], \f[I]weibo\f[], \f[I][wikimedia]\f[] .br * \f[I]"1.4"\f[] \f[I]wallhaven\f[] .br * \f[I]"2.0-4.0"\f[] \f[I]behance\f[], \f[I]imagefap\f[], \f[I][Nijie]\f[] .br * \f[I]"3.0-6.0"\f[] \f[I]bilibili\f[], \f[I]exhentai\f[], \f[I]idolcomplex\f[], \f[I][reactor]\f[], \f[I]readcomiconline\f[] .br * \f[I]"6.0-6.1"\f[] \f[I]twibooru\f[] .br * \f[I]"6.0-12.0"\f[] \f[I]instagram\f[] .br * \f[I]0\f[] otherwise .IP "Description:" 4 Minimal time interval in seconds between each HTTP request during data extraction. .SS extractor.*.username & .password .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The username and password to use when attempting to log in to another site. This is supported for .br * \f[I]aibooru\f[] (*) .br * \f[I]ao3\f[] .br * \f[I]aryion\f[] .br * \f[I]atfbooru\f[] (*) .br * \f[I]bluesky\f[] .br * \f[I]booruvar\f[] (*) .br * \f[I]coomer\f[] .br * \f[I]danbooru\f[] (*) .br * \f[I]deviantart\f[] .br * \f[I]e621\f[] (*) .br * \f[I]e6ai\f[] (*) .br * \f[I]e926\f[] (*) .br * \f[I]exhentai\f[] .br * \f[I]girlswithmuscle\f[] .br * \f[I]horne\f[] (R) .br * \f[I]idolcomplex\f[] .br * \f[I]imgbb\f[] .br * \f[I]inkbunny\f[] .br * \f[I]iwara\f[] .br * \f[I]kemono\f[] .br * \f[I]madokami\f[] (R) .br * \f[I]mangadex\f[] .br * \f[I]mangoxo\f[] .br * \f[I]newgrounds\f[] .br * \f[I]nijie\f[] (R) .br * \f[I]pillowfort\f[] .br * \f[I]rule34xyz\f[] .br * \f[I]sankaku\f[] .br * \f[I]schalenetwork\f[] .br * \f[I]scrolller\f[] .br * \f[I]seiga\f[] .br * \f[I]subscribestar\f[] .br * \f[I]tapas\f[] .br * \f[I]tsumino\f[] .br * \f[I]twitter\f[] .br * \f[I]vipergirls\f[] .br * \f[I]zerochan\f[] These values can also be specified via the \f[I]-u/--username\f[] and \f[I]-p/--password\f[] command-line options or by using a \f[I].netrc\f[] file. (see Authentication_) (*) The password value for these sites should be the API key found in your user profile, not the actual account password. (R) Login with username & password or supplying logged-in \f[I]cookies\f[] is required Note: Leave the \f[I]password\f[] value empty or undefined to be prompted for a password when performing a login (see \f[I]getpass()\f[]). .SS extractor.*.input .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] if stdin is attached to a terminal, \f[I]false\f[] otherwise .IP "Description:" 4 Allow prompting the user for interactive input. .SS extractor.*.netrc .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable the use of \f[I].netrc\f[] authentication data. .SS extractor.*.cookies .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]object\f[] (name -> value) .br * \f[I]list\f[] .IP "Description:" 4 Source to read additional cookies from. This can be .br * The \f[I]Path\f[] to a Mozilla/Netscape format cookies.txt file .. code:: json "~/.local/share/cookies-instagram-com.txt" .br * An \f[I]object\f[] specifying cookies as name-value pairs .. code:: json { "cookie-name": "cookie-value", "sessionid" : "14313336321%3AsabDFvuASDnlpb%3A31", "isAdult" : "1" } .br * A \f[I]list\f[] with up to 5 entries specifying a browser profile. .br * The first entry is the browser name .br * The optional second entry is a profile name or an absolute path to a profile directory .br * The optional third entry is the keyring to retrieve passwords for decrypting cookies from .br * The optional fourth entry is a (Firefox) container name (\f[I]"none"\f[] for only cookies with no container (default)) .br * The optional fifth entry is the domain to extract cookies for. Prefix it with a dot \f[I].\f[] to include cookies for subdomains. .. code:: json ["firefox"] ["firefox", null, null, "Personal"] ["chromium", "Private", "kwallet", null, ".twitter.com"] .SS extractor.*.cookies-select .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"random"\f[] .IP "Description:" 4 Interpret \f[I]extractor.cookies\f[] as a list of cookie sources and select one of them for each extractor run. .br * \f[I]"random"\f[]: Select cookies \f[I]randomly\f[] .br * \f[I]"rotate"\f[]: Select cookies in sequence. Start over from the beginning after reaching the end of the list. .. code:: json [ "~/.local/share/cookies-instagram-com-1.txt", "~/.local/share/cookies-instagram-com-2.txt", "~/.local/share/cookies-instagram-com-3.txt", ["firefox", null, null, "c1", ".instagram-com"], ] .SS extractor.*.cookies-update .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Export session cookies in cookies.txt format. .br * If this is a \f[I]Path\f[], write cookies to the given file path. .br * If this is \f[I]true\f[] and \f[I]extractor.*.cookies\f[] specifies the \f[I]Path\f[] of a valid cookies.txt file, update its contents. .SS extractor.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Example:" 4 .. code:: json "http://10.10.1.10:3128" .. code:: json { "http" : "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080", "http://10.20.1.128": "http://10.10.1.10:5323" } .IP "Description:" 4 Proxy (or proxies) to be used for remote connections. .br * If this is a \f[I]string\f[], it is the proxy URL for all outgoing requests. .br * If this is an \f[I]object\f[], it is a scheme-to-proxy mapping to specify different proxy URLs for each scheme. It is also possible to set a proxy for a specific host by using \f[I]scheme://host\f[] as key. See \f[I]Requests' proxy documentation\f[] for more details. Note: If a proxy URL does not include a scheme, \f[I]http://\f[] is assumed. .SS extractor.*.proxy-env .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Collect proxy configuration information from environment variables (\f[I]HTTP_PROXY\f[], \f[I]HTTPS_PROXY\f[], \f[I]NO_PROXY\f[]) and Windows Registry settings. .SS extractor.*.source-address .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] with 1 \f[I]string\f[] and 1 \f[I]integer\f[] as elements .IP "Example:" 4 .br * "192.168.178.20" .br * ["192.168.178.20", 8080] .IP "Description:" 4 Client-side IP address to bind to. Can be either a simple \f[I]string\f[] with just the local IP address .br or a \f[I]list\f[] with IP and explicit port number as elements. .br .SS extractor.*.user-agent .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]"gallery-dl/VERSION"\f[]: \f[I][Danbooru]\f[], \f[I]mangadex\f[], \f[I]weasyl\f[] .br * \f[I]"gallery-dl/VERSION (by mikf)"\f[]: \f[I][E621]\f[] .br * \f[I]"net.umanle.arca.android.playstore/0.9.75"\f[]: \f[I]arcalive\f[] .br * \f[I]"Patreon/72.2.28 (Android; Android 14; Scale/2.10)"\f[]: \f[I]patreon\f[] .br * \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/LATEST.0.0.0 Safari/537.36"\f[]: \f[I]instagram\f[] .br * \f[I]"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:LATEST) Gecko/20100101 Firefox/LATEST"\f[]: otherwise .IP "Description:" 4 User-Agent header value used for HTTP requests. Setting this value to \f[I]"browser"\f[] will try to automatically detect and use the \f[I]User-Agent\f[] header of the system's default browser. .SS extractor.*.browser .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]"firefox"\f[]: \f[I]artstation\f[], \f[I]behance\f[], \f[I]fanbox\f[], \f[I]twitter\f[] .br * \f[I]null\f[]: otherwise .IP "Example:" 4 .br * "firefox/128:linux" .br * "chrome:macos" .IP "Description:" 4 Try to emulate a real browser (\f[I]firefox\f[] or \f[I]chrome\f[]) by using their default HTTP headers and TLS ciphers for HTTP requests. Optionally, the operating system used in the \f[I]User-Agent\f[] header can be specified after a \f[I]:\f[] (\f[I]windows\f[], \f[I]linux\f[], or \f[I]macos\f[]). Supported browsers: .br * \f[I]firefox\f[] .br * \f[I]firefox/140\f[] .br * \f[I]firefox/128\f[] .br * \f[I]chrome\f[] .br * \f[I]chrome/138\f[] .br * \f[I]chrome/111\f[] Note: This option sets custom \f[I]headers\f[] and \f[I]ciphers\f[] defaults. Note: \f[I]requests\f[] and \f[I]urllib3\f[] only support HTTP/1.1, while a real browser would use HTTP/2. .SS extractor.*.referer .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Send \f[I]Referer\f[] headers with all outgoing HTTP requests. If this is a \f[I]string\f[], send it as Referer instead of the extractor's \f[I]root\f[] domain. .SS extractor.*.headers .IP "Type:" 6 .br * \f[I]"string"\f[] .br * \f[I]object\f[] (name -> value) .IP "Default:" 9 .. code:: json { "User-Agent" : "", "Accept" : "*/*", "Accept-Language": "en-US,en;q=0.5", "Accept-Encoding": "gzip, deflate", "Referer" : "" } .IP "Description:" 4 Additional \f[I]HTTP headers\f[] to be sent with each HTTP request, To disable sending a header, set its value to \f[I]null\f[]. Set this option to \f[I]"firefox"\f[] or \f[I]"chrome"\f[] to use these browser's default headers. .SS extractor.*.ciphers .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "firefox" .br * .. code:: json ["ECDHE-ECDSA-AES128-GCM-SHA256", "ECDHE-RSA-AES128-GCM-SHA256", "ECDHE-ECDSA-CHACHA20-POLY1305", "ECDHE-RSA-CHACHA20-POLY1305"] .IP "Description:" 4 List of TLS/SSL cipher suites in \f[I]OpenSSL cipher list format\f[] to be passed to \f[I]ssl.SSLContext.set_ciphers()\f[] Set this option to \f[I]"firefox"\f[] or \f[I]"chrome"\f[] to use these browser's default ciphers. .SS extractor.*.tls12 .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 .br * \f[I]false\f[]: \f[I]artstation\f[], \f[I]behance\f[] .br * \f[I]true\f[]: otherwise .IP "Description:" 4 Allow selecting TLS 1.2 cipher suites. Can be disabled to alter TLS fingerprints and potentially bypass Cloudflare blocks. .SS extractor.*.keywords .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"type": "Pixel Art", "type_id": 123} .IP "Description:" 4 Additional name-value pairs to be added to each metadata dictionary. .SS extractor.*.keywords-eval .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Evaluate each \f[I]keywords\f[] \f[I]string\f[] value as a \f[I]format string\f[]. .SS extractor.*.keywords-default .IP "Type:" 6 any .IP "Default:" 9 \f[I]"None"\f[] .IP "Description:" 4 Default value used for missing or undefined keyword names in \f[I]format strings\f[]. .SS extractor.*.url-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a file's download URL into its metadata dictionary as the given name. For example, setting this option to \f[I]"gdl_file_url"\f[] will cause a new metadata field with name \f[I]gdl_file_url\f[] to appear, which contains the current file's download URL. This can then be used in \f[I]filenames\f[], with a \f[I]metadata\f[] post processor, etc. .SS extractor.*.path-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a reference to the current \f[I]PathFormat\f[] data structure into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_path"\f[] would make it possible to access the current file's filename as \f[I]"{gdl_path.filename}"\f[]. .SS extractor.*.extractor-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert a reference to the current \f[I]Extractor\f[] object into metadata dictionaries as the given name. .SS extractor.*.http-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing a file's HTTP headers and \f[I]filename\f[], \f[I]extension\f[], and \f[I]date\f[] parsed from them into metadata dictionaries as the given name. For example, setting this option to \f[I]"gdl_http"\f[] would make it possible to access the current file's \f[I]Last-Modified\f[] header as \f[I]"{gdl_http[Last-Modified]}"\f[] and its parsed form as \f[I]"{gdl_http[date]}"\f[]. .SS extractor.*.version-metadata .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Insert an \f[I]object\f[] containing gallery-dl's version info into metadata dictionaries as the given name. The content of the object is as follows: .. code:: json { "version" : "string", "is_executable" : "bool", "current_git_head": "string or null" } .SS extractor.*.category-transfer .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 Extractor-specific .IP "Description:" 4 Transfer an extractor's (sub)category values to all child extractors spawned by it, to let them inherit their parent's config options. .SS extractor.*.blacklist & .whitelist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["oauth", "recursive", "test"]\f[] + current extractor category .IP "Example:" 4 ["imgur", "redgifs:user", "*:image"] .IP "Description:" 4 A list of extractor identifiers to ignore (or allow) when spawning child extractors for unknown URLs, e.g. from \f[I]reddit\f[] or \f[I]plurk\f[]. Each identifier can be .br * A category or basecategory name (\f[I]"imgur"\f[], \f[I]"mastodon"\f[]) .br * | A (base)category-subcategory pair, where both names are separated by a colon (\f[I]"redgifs:user"\f[]). Both names can be a * or left empty, matching all possible names (\f[I]"*:image"\f[], \f[I]":user"\f[]). .br Note: Any \f[I]blacklist\f[] setting will automatically include \f[I]"oauth"\f[], \f[I]"recursive"\f[], and \f[I]"test"\f[]. .SS extractor.*.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "$HOME/.archives/{category}.sqlite3" .br * "postgresql://user:pass@host/database" .IP "Description:" 4 File to store IDs of downloaded files in. Downloads of files already recorded in this archive file will be \f[I]skipped\f[]. The resulting archive file is not a plain text file but an SQLite3 database, as either lookup operations are significantly faster or memory requirements are significantly lower when the amount of stored IDs gets reasonably large. If this value is a \f[I]PostgreSQL Connection URI\f[], the archive will use this PostgreSQL database as backend (requires \f[I]Psycopg\f[]). Note: Archive files that do not already exist get generated automatically. Note: Archive paths support regular \f[I]format string\f[] replacements, but be aware that using external inputs for building local paths may pose a security risk. .SS extractor.*.archive-event .IP "Type:" 6 + \f[I]string\f[] + \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Example:" 4 .br * "file,skip" .br * ["file", "skip"] .IP "Description:" 4 \f[I]Event(s)\f[] for which IDs get written to an \f[I]archive\f[]. Available events are: \f[I]file\f[], \f[I]skip\f[] .SS extractor.*.archive-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "{id}_{offset}" .IP "Description:" 4 An alternative \f[I]format string\f[] to build archive IDs with. .SS extractor.*.archive-mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 Controls when to write \f[I]archive IDs\f[] to the archive database. .br * \f[I]"file"\f[]: Write IDs immediately after completing or skipping a file download. .br * \f[I]"memory"\f[]: Keep IDs in memory and only write them after successful job completion. .SS extractor.*.archive-prefix .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 .br * \f[I]""\f[] when \f[I]archive-table\f[] is set .br * \f[I]"{category}"\f[] otherwise .IP "Description:" 4 Prefix for archive IDs. .SS extractor.*.archive-pragma .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["journal_mode=WAL", "synchronous=NORMAL"] .IP "Description:" 4 A list of SQLite \f[I]PRAGMA\f[] statements to run during archive initialization. See \f[I]\f[] for available \f[I]PRAGMA\f[] statements and further details. .SS extractor.*.archive-table .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"archive"\f[] .IP "Example:" 4 "{category}" .IP "Description:" 4 \f[I]Format string\f[] selecting the archive database table name. .SS extractor.*.actions .IP "Type:" 6 .br * \f[I]object\f[] (pattern -> \f[I]Action(s)\f[]) .br * \f[I]list\f[] of \f[I]lists\f[] with pattern -> \f[I]Action(s)\f[] pairs as elements .IP "Example:" 4 .. code:: json { "info:Logging in as .+" : "level = debug", "warning:(?i)unable to .+": "exit 127", "error" : [ "status \f[I]= 1", "exec notify.sh 'gdl error'", "abort" ] } .. code:: json [ ["info:Logging in as .+" , "level = debug"], ["warning:(?i)unable to .+", "exit 127" ], ["error" , [ "status \f[]= 1", "exec notify.sh 'gdl error'", "abort" ]] ] .IP "Description:" 4 Perform an \f[I]Action\f[] when logging a message matched by \f[I]pattern\f[]. \f[I]pattern\f[] is parsed as severity level (\f[I]debug\f[], \f[I]info\f[], \f[I]warning\f[], \f[I]error\f[], or integer value) followed by an optional \f[I]Python Regular Expression\f[] separated by a colon \f[I]:\f[] Using \f[I]*\f[] as level or leaving it empty matches logging messages of all levels (e.g. \f[I]*:\f[] or \f[I]:\f[]). .SS extractor.*.postprocessors .IP "Type:" 6 .br * \f[I]Postprocessor Configuration\f[] object .br * \f[I]list\f[] of \f[I]Postprocessor Configuration\f[] objects .IP "Example:" 4 .. code:: json [ { "name": "zip" , "compression": "store" }, { "name": "exec", "command": ["/home/foobar/script", "{category}", "{image_id}"] } ] .IP "Description:" 4 A list of \f[I]post processors\f[] to be applied to each downloaded file in the specified order. Unlike other options, a \f[I]postprocessors\f[] setting at a deeper level .br does not override any \f[I]postprocessors\f[] setting at a lower level. Instead, all post processors from all applicable \f[I]postprocessors\f[] .br settings get combined into a single list. For example .br * an \f[I]mtime\f[] post processor at \f[I]extractor.postprocessors\f[], .br * a \f[I]zip\f[] post processor at \f[I]extractor.pixiv.postprocessors\f[], .br * and using \f[I]--exec\f[] will run all three post processors - \f[I]mtime\f[], \f[I]zip\f[], \f[I]exec\f[] - for each downloaded \f[I]pixiv\f[] file. .SS extractor.*.postprocessor-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "archive": null, "keep-files": true } .IP "Description:" 4 Additional \f[I]Postprocessor Options\f[] that get added to each individual \f[I]post processor object\f[] before initializing it and evaluating filters. .SS extractor.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Maximum number of times a failed HTTP request is retried before giving up, or \f[I]-1\f[] for infinite retries. .SS extractor.*.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Example:" 4 [404, 429, 430] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry an HTTP request on. \f[I]2xx\f[] codes (success responses) and \f[I]3xx\f[] codes (redirection messages) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS extractor.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]30.0\f[] .IP "Description:" 4 Amount of time (in seconds) to wait for a successful connection and response from a remote server. This value gets internally used as the \f[I]timeout\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to verify SSL/TLS certificates for HTTPS requests. If this is a \f[I]string\f[], it must be the path to a CA bundle to use instead of the default certificates. This value gets internally used as the \f[I]verify\f[] parameter for the \f[I]requests.request()\f[] method. .SS extractor.*.truststore .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Use a .br \f[I]truststore\f[] \f[I]SSLContext\f[] for verifying SSL/TLS certificates to make use of your system's native certificate stores .br instead of relying on \f[I]certifi\f[] certificates. .SS extractor.*.download .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to download media files. Setting this to \f[I]false\f[] won't download any files, but all other functions (\f[I]postprocessors\f[], \f[I]download archive\f[], etc.) will be executed as normal. .SS extractor.*.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use fallback download URLs when a download fails. .SS extractor.*.image-range .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"10-20"\f[] .br * \f[I]"-5, 10, 30-50, 100-"\f[] .br * \f[I]"10:21, 30:51:2, :5, 100:"\f[] .br * \f[I]["-5", "10", "30-50", "100-"]\f[] .IP "Description:" 4 Index range(s) selecting which files to download. These can be specified as .br * index: \f[I]3\f[] (file number 3) .br * range: \f[I]2-4\f[] (files 2, 3, and 4) .br * \f[I]slice\f[]: \f[I]3:8:2\f[] (files 3, 5, and 7) Arguments for range and slice notation are optional .br and will default to begin (\f[I]1\f[]) or end (\f[I]sys.maxsize\f[]) if omitted. For example \f[I]5-\f[], \f[I]5:\f[], and \f[I]5::\f[] all mean "Start at file number 5". .br Note: The index of the first file is \f[I]1\f[]. .SS extractor.*.chapter-range .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Like \f[I]image-range\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"re.search(r'foo(bar)+', description)"\f[] .br * \f[I]["width >= 1200", "width/height > 1.2"]\f[] .IP "Description:" 4 Python expression controlling which files to download. A file only gets downloaded when *all* of the given expressions evaluate to \f[I]True\f[]. Available values are the filename-specific ones listed by \f[I]-K\f[] or \f[I]-j\f[]. .SS extractor.*.chapter-filter .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Examples:" 4 .br * \f[I]"lang == 'en'"\f[] .br * \f[I]["language == 'French'", "10 <= chapter < 20"]\f[] .IP "Description:" 4 Like \f[I]image-filter\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.image-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Ignore image URLs that have been encountered before during the current extractor run. .SS extractor.*.chapter-unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Like \f[I]image-unique\f[], but applies to delegated URLs like manga chapters, etc. .SS extractor.*.date-format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"%Y-%m-%dT%H:%M:%S"\f[] .IP "Description:" 4 Format string used to parse \f[I]string\f[] values of date-min and date-max. See \f[I]strptime\f[] for a list of formatting directives. Note: Despite its name, this option does **not** control how \f[I]{date}\f[] metadata fields are formatted. To use a different formatting for those values other than the default \f[I]%Y-%m-%d %H:%M:%S\f[], put \f[I]strptime\f[] formatting directives after a colon \f[I]:\f[], for example \f[I]{date:%Y%m%d}\f[]. .SS extractor.*.write-pages .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 During data extraction, write received HTTP request data to enumerated files in the current working directory. Special values: .br * \f[I]"all"\f[]: Include HTTP request and response headers. Hide \f[I]Authorization\f[], \f[I]Cookie\f[], and \f[I]Set-Cookie\f[] values. .br * \f[I]"ALL"\f[]: Include all HTTP request and response headers. .SH EXTRACTOR-SPECIFIC OPTIONS .SS extractor.ao3.formats .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pdf"\f[] .IP "Example:" 4 .br * "azw3,epub,mobi,pdf,html" .br * ["azw3", "epub", "mobi", "pdf", "html"] .IP "Description:" 4 Format(s) to download. .SS extractor.arcalive.emoticons .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download emoticon images. .SS extractor.arcalive.gifs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Try to download \f[I].gif\f[] versions of \f[I].mp4\f[] videos. \f[I]true\f[] | \f[I]"fallback\f[] Use the \f[I].gif\f[] version as primary URL and provide the \f[I].mp4\f[] one as \f[I]fallback\f[]. \f[I]"check"\f[] Check whether a \f[I].gif\f[] version is available by sending an extra HEAD request. \f[I]false\f[] Always download the \f[I].mp4\f[] version. .SS extractor.artstation.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to follow external URLs of embedded players. .SS extractor.artstation.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts/projects to download. .SS extractor.artstation.mviews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I].mview\f[] files. .SS extractor.artstation.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download embed previews. .SS extractor.artstation.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video clips. .SS extractor.artstation.search.pro-first .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable the "Show Studio and Pro member artwork first" checkbox when retrieving search results. .SS extractor.aryion.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the post extraction strategy. .br * \f[I]true\f[]: Start on users' main gallery pages and recursively descend into subfolders .br * \f[I]false\f[]: Get posts from "Latest Updates" pages .SS extractor.batoto.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Example:" 4 "mangatoto.org" .IP "Description:" 4 Specifies the domain used by \f[I]batoto\f[] extractors. \f[I]"auto"\f[] | \f[I]"url"\f[] Use the input URL's domain \f[I]"nolegacy"\f[] Use the input URL's domain .br - replace legacy domains with \f[I]"xbato.org"\f[] \f[I]"nowarn"\f[] Use the input URL's domain .br - do not warn about legacy domains any \f[I]string\f[] Use this domain .SS extractor.bbc.width .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]1920\f[] .IP "Description:" 4 Specifies the requested image width. This value must be divisble by 16 and gets rounded down otherwise. The maximum possible value appears to be \f[I]1920\f[]. .SS extractor.behance.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "video", "mediacollection", "embed"]\f[] .IP "Description:" 4 Selects which gallery modules to download from. Supported module types are \f[I]image\f[], \f[I]video\f[], \f[I]mediacollection\f[], \f[I]embed\f[], \f[I]text\f[]. .SS extractor.[blogger].api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Custom Blogger API key. https://developers.google.com/blogger/docs/3.0/using#APIKey .SS extractor.[blogger].videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded videos hosted on https://www.blogger.com/ .SS extractor.bluesky.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 .br * \f[I]"posts"\f[] if \f[I]reposts\f[] or \f[I]quoted\f[] is enabled .br * \f[I]"media"\f[] otherwise .IP "Example:" 4 .br * "avatar,background,posts" .br * ["avatar", "background", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"info"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"posts"\f[], \f[I]"replies"\f[], \f[I]"media"\f[], \f[I]"video"\f[], \f[I]"likes"\f[], It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.bluesky.likes.endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"listRecords"\f[] .IP "Description:" 4 API endpoint to use for retrieving liked posts. \f[I]"listRecords"\f[] Use the results from .br \f[I]com.atproto.repo.listRecords\f[] Requires no login and alows accessing likes of all users, .br but uses one request to \f[I]getPostThread\f[] per post, \f[I]"getActorLikes"\f[] Use the results from .br \f[I]app.bsky.feed.getActorLikes\f[] Requires login and only allows accessing your own likes. .br .SS extractor.bluesky.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "facets,user" .br * ["facets", "user"] .IP "Description:" 4 Extract additional metadata. .br * \f[I]facets\f[]: \f[I]hashtags\f[], \f[I]mentions\f[], and \f[I]uris\f[] .br * \f[I]user\f[]: detailed \f[I]user\f[] metadata for the user referenced in the input URL (See \f[I]app.bsky.actor.getProfile\f[]). .SS extractor.bluesky.likes.depth .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Sets the maximum depth of returned reply posts. (See depth parameter of \f[I]app.bsky.feed.getPostThread\f[]) .SS extractor.bluesky.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted posts. .SS extractor.bluesky.reposts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Process reposts. .SS extractor.bluesky.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.boosty.allowed .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Request only available posts. .SS extractor.boosty.bought .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Request only purchased posts for \f[I]feed\f[] results. .SS extractor.boosty.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide detailed \f[I]user\f[] metadata. .SS extractor.boosty.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 ["full_hd", "high", "medium"] .IP "Description:" 4 Download videos. If this is a \f[I]list\f[], it selects which format to try to download. .br Possibly available formats are .br .br * \f[I]ultra_hd\f[] (2160p) .br * \f[I]quad_hd\f[] (1440p) .br * \f[I]full_hd\f[] (1080p) .br * \f[I]high\f[] (720p) .br * \f[I]medium\f[] (480p) .br * \f[I]low\f[] (360p) .br * \f[I]lowest\f[] (240p) .br * \f[I]tiny\f[] (144p) .SS extractor.bunkr.endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"/api/_001_v2"\f[] .IP "Description:" 4 API endpoint for retrieving file URLs. .SS extractor.bunkr.tlds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls which \f[I]bunkr\f[] TLDs to accept. .br * \f[I]true\f[]: Match URLs with *all* possible TLDs (e.g. \f[I]bunkr.xyz\f[] or \f[I]bunkrrr.duck\f[]) .br * \f[I]false\f[]: Match only URLs with known TLDs .SS extractor.cien.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "video", "download", "gallery"]\f[] .IP "Description:" 4 Determines the type and order of files to download. Available types are \f[I]image\f[], \f[I]video\f[], \f[I]download\f[], \f[I]gallery\f[]. .SS extractor.civitai.api .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"trpc"\f[] .IP "Description:" 4 Selects which API endpoints to use. .br * \f[I]"rest"\f[]: \f[I]Public REST API\f[] .br * \f[I]"trpc"\f[]: Internal tRPC API .SS extractor.civitai.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The API Key value generated in your \f[I]User Account Settings\f[] to make authorized API requests. See \f[I]API/Authorization\f[] for details. .SS extractor.civitai.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image"]\f[] .IP "Description:" 4 Determines the type and order of files to download when processing models. Available types are \f[I]model\f[], \f[I]image\f[], \f[I]gallery\f[]. .SS extractor.civitai.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["user-images", "user-videos"]\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are .br * \f[I]"user-models"\f[] .br * \f[I]"user-posts"\f[] .br * \f[I]"user-images"\f[] .br * \f[I]"user-videos"\f[] It is possible to use \f[I]"all"\f[] instead of listing all values separately. .IP "Note:" 4 To get a more complete set of metadata like \f[I]model['name']\f[] and \f[I]post['title']\f[], include \f[I]user-models\f[] and \f[I]user-posts\f[] as well as the default \f[I]user-images\f[] and \f[I]user-videos\f[]: \f[I]["user-models", "user-posts", "user-images", "user-videos"]\f[] .SS extractor.civitai.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "generation,post,version" .br * ["version", "generation"] .IP "Description:" 4 Extract additional \f[I]generation\f[], \f[I]version\f[], and \f[I]post\f[] metadata. Note: This requires 1 or more additional API requests per image or video. .SS extractor.civitai.nsfw .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] (\f[I]"api": "rest"\f[]) .br * \f[I]integer\f[] (\f[I]"api": "trpc"\f[]) .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download NSFW-rated images. .br * For \f[I]"api": "rest"\f[], this can be one of \f[I]"None"\f[], \f[I]"Soft"\f[], \f[I]"Mature"\f[], \f[I]"X"\f[] to set the highest returned mature content flag. .br * For \f[I]"api": "trpc"\f[], this can be an \f[I]integer\f[] whose bits select the returned mature content flags. For example, \f[I]28\f[] (\f[I]4\f[I]8\f[]16\f[]) would return only \f[I]R\f[], \f[I]X\f[], and \f[I]XXX\f[] rated images, while \f[I]3\f[] (\f[I]1|2\f[]) would return only \f[I]None\f[] and \f[I]Soft\f[] rated images, .SS extractor.civitai.quality .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"original=true"\f[] .IP "Example:" 4 .br * "width=1280,quality=90" .br * ["width=1280", "quality=90"] .IP "Description:" 4 A (comma-separated) list of image quality options to pass with every image URL. Known available options include \f[I]original\f[], \f[I]quality\f[], \f[I]width\f[] Note: Set this option to an arbitrary letter, e.g., \f[I]"w"\f[], to download images in JPEG format at their original resolution. .SS extractor.civitai.quality-videos .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"quality=100"\f[] .IP "Example:" 4 .br * "+transcode=true,quality=100" .br * ["+", "transcode=true", "quality=100"] .IP "Description:" 4 A (comma-separated) list of video quality options to pass with every video URL. Known available options include \f[I]original\f[], \f[I]quality\f[], \f[I]transcode\f[] Use \f[I]+\f[] as first character to add the given options to the \f[I]quality\f[] ones. .SS extractor.cyberdrop.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "cyberdrop.to" .IP "Description:" 4 Specifies the domain used by \f[I]cyberdrop\f[] regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.[Danbooru].external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For unavailable or restricted posts, follow the \f[I]source\f[] and download from there if possible. .SS extractor.[Danbooru].favgroup.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"pool"\f[] .IP "Description:" 4 Controls the order in which \f[I]pool\f[]/\f[I]favgroup\f[] posts are returned. \f[I]"pool"\f[] \f[I] \f[I]"pool_asc"\f[] \f[] \f[I]"asc"\f[] \f[I] \f[I]"asc_pool"\f[] Pool order \f[I]"pool_desc"\f[] \f[] \f[I]"desc_pool"\f[] \f[I] \f[I]"desc"\f[] Reverse Pool order \f[I]"id"\f[] \f[] \f[I]"id_desc"\f[] \f[I] \f[I]"desc_id"\f[] Descending Post ID order \f[I]"id_asc"\f[] \f[] \f[I]"asc_id"\f[] Ascending Post ID order .SS extractor.[Danbooru].ugoira .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the download target for Ugoira posts. .br * \f[I]true\f[]: Original ZIP archives .br * \f[I]false\f[]: Converted video files .SS extractor.[Danbooru].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "replacements,comments,ai_tags" .br * ["replacements", "comments", "ai_tags"] .IP "Description:" 4 Extract additional metadata (notes, artist commentary, parent, children, uploader) It is possible to specify a custom list of metadata includes. See \f[I]available_includes\f[] for possible field names. \f[I]aibooru\f[] also supports \f[I]ai_metadata\f[]. Note: This requires 1 additional HTTP request per 200-post batch. .SS extractor.[Danbooru].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 200. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.dankefuerslesen.zip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download each chapter as a single ZIP archive instead of individual images. .SS extractor.deviantart.auto-watch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically watch users when encountering "Watchers-Only Deviations" (requires a \f[I]refresh-token\f[]). .SS extractor.deviantart.auto-unwatch .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 After watching a user through \f[I]auto-watch\f[], unwatch that user at the end of the current extractor run. .SS extractor.deviantart.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. .SS extractor.deviantart.comments-avatars .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download the avatar of each commenting user. Note: Enabling this option also enables deviantart.comments_. .SS extractor.deviantart.extra .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download extra Sta.sh resources from description texts and journals. Note: Enabling this option also enables deviantart.metadata_. .SS extractor.deviantart.flat .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Select the directory structure created by the Gallery- and Favorite-Extractors. .br * \f[I]true\f[]: Use a flat directory structure. .br * \f[I]false\f[]: Collect a list of all gallery-folders or favorites-collections and transfer any further work to other extractors (\f[I]folder\f[] or \f[I]collection\f[]), which will then create individual subdirectories for each of them. Note: Going through all gallery folders will not be able to fetch deviations which aren't in any folder. .SS extractor.deviantart.folders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide a \f[I]folders\f[] metadata field that contains the names of all folders a deviation is present in. Note: Gathering this information requires a lot of API calls. Use with caution. .SS extractor.deviantart.group .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check whether the profile name in a given URL belongs to a group or a regular user. When disabled, assume every given profile name belongs to a regular user. Special values: .br * \f[I]"skip"\f[]: Skip groups .SS extractor.deviantart.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "favorite,journal,scraps" .br * ["favorite", "journal", "scraps"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"journal"\f[], \f[I]"favorite"\f[], \f[I]"status"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.deviantart.intermediary .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For older non-downloadable images, download a higher-quality \f[I]/intermediary/\f[] version. .SS extractor.deviantart.journals .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"html"\f[] .IP "Description:" 4 Selects the output format for textual content. This includes journals, literature and status updates. .br * \f[I]"html"\f[]: HTML with (roughly) the same layout as on DeviantArt. .br * \f[I]"text"\f[]: Plain text with image references and HTML tags removed. .br * \f[I]"none"\f[]: Don't download textual content. .SS extractor.deviantart.jwt .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Update \f[I]JSON Web Tokens\f[] (the \f[I]token\f[] URL parameter) of otherwise non-downloadable, low-resolution images to be able to download them in full resolution. Note: No longer functional as of 2023-10-11 .SS extractor.deviantart.mature .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable mature content. This option simply sets the \f[I]mature_content\f[] parameter for API calls to either \f[I]"true"\f[] or \f[I]"false"\f[] and does not do any other form of content filtering. .SS extractor.deviantart.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "stats,submission" .br * ["camera", "stats", "submission"] .IP "Description:" 4 Extract additional metadata for deviation objects. Provides \f[I]description\f[], \f[I]tags\f[], \f[I]license\f[], and \f[I]is_watching\f[] fields when enabled. It is possible to request extended metadata by specifying a list of .br * \f[I]camera\f[] : EXIF information (if available) .br * \f[I]stats\f[] : deviation statistics .br * \f[I]submission\f[] : submission information .br * \f[I]collection\f[] : favourited folder information (requires a \f[I]refresh token\f[]) .br * \f[I]gallery\f[] : gallery folder information (requires a \f[I]refresh token\f[]) Set this option to \f[I]"all"\f[] to request all extended metadata categories. See \f[I]/deviation/metadata\f[] for official documentation. .SS extractor.deviantart.original .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original files if available. Setting this option to \f[I]"images"\f[] only downloads original files if they are images and falls back to preview versions for everything else (archives, videos, etc.). .SS extractor.deviantart.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls when to stop paginating over API results. .br * \f[I]"api"\f[]: Trust the API and stop when \f[I]has_more\f[] is \f[I]false\f[]. .br * \f[I]"manual"\f[]: Disregard \f[I]has_more\f[] and only stop when a batch of results is empty. .SS extractor.deviantart.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For non-image files (archives, videos, etc.), also download the file's preview image. Set this option to \f[I]"all"\f[] to download previews for all files. .SS extractor.deviantart.public .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use a public access token for API requests. Disable this option to *force* using a private token for all requests when a \f[I]refresh token\f[] is provided. .SS extractor.deviantart.quality .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]100\f[] .IP "Description:" 4 JPEG quality level of images for which an original file download is not available. Set this to \f[I]"png"\f[] to download a PNG version of these images instead. .SS extractor.deviantart.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your DeviantArt account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available deviations. Note: The \f[I]refresh-token\f[] becomes invalid \f[I]after 3 months\f[] or whenever your \f[I]cache file\f[] is deleted or cleared. .SS extractor.deviantart.wait-min .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Minimum wait time in seconds before API requests. .SS extractor.deviantart.avatar.formats .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["original.jpg", "big.jpg", "big.gif", ".png"] .IP "Description:" 4 Avatar URL formats to return. Each format is parsed as \f[I]SIZE.EXT\f[]. .br Leave \f[I]SIZE\f[] empty to download the regular, small avatar format. .br .SS extractor.deviantart.folder.subfolders .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Also extract subfolder content. .SS extractor.discord.embeds .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["image", "gifv", "video"]\f[] .IP "Description:" 4 Selects which embed types to download from. Supported embed types are \f[I]image\f[], \f[I]gifv\f[], \f[I]video\f[], \f[I]rich\f[], \f[I]article\f[], \f[I]link\f[]. .SS extractor.discord.threads .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract threads from Discord text channels. .SS extractor.discord.token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Discord Bot Token for API requests. You can follow \f[I]this guide\f[] to get a token. .SS extractor.dynastyscans.anthology.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]alert\f[], \f[I]description\f[], and \f[I]status\f[] metadata from an anthology's HTML page. .SS extractor.[E621].metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "notes,pools" .br * ["notes", "pools"] .IP "Description:" 4 Extract additional metadata (notes, pool metadata) if available. Note: This requires 0-2 additional HTTP requests per post. .SS extractor.[E621].threshold .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Stop paginating over API results if the length of a batch of returned posts is less than the specified number. Defaults to the per-page limit of the current instance, which is 320. Note: Changing this setting is normally not necessary. When the value is greater than the per-page limit, gallery-dl will stop after the first batch. The value cannot be less than 1. .SS extractor.exhentai.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 .br * \f[I]"auto"\f[]: Use \f[I]e-hentai.org\f[] or \f[I]exhentai.org\f[] depending on the input URL .br * \f[I]"e-hentai.org"\f[]: Use \f[I]e-hentai.org\f[] for all URLs .br * \f[I]"exhentai.org"\f[]: Use \f[I]exhentai.org\f[] for all URLs .SS extractor.exhentai.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of times a failed image gets retried or \f[I]-1\f[] for infinite retries. .SS extractor.exhentai.fav .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "4" .IP "Description:" 4 After downloading a gallery, add it to your account's favorites as the given category number. Note: Set this to "favdel" to remove galleries from your favorites. Note: This will remove any Favorite Notes when applied to already favorited galleries. .SS extractor.exhentai.gp .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"resized"\f[] .IP "Description:" 4 Selects how to handle "you do not have enough GP" errors. .br * "resized": Continue downloading \f[I]non-original\f[] images. .br * "stop": Stop the current extractor run. .br * "wait": Wait for user input before retrying the current image. .SS extractor.exhentai.limits .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Set a custom image download limit and perform \f[I]limits-action\f[] when it gets exceeded. .SS extractor.exhentai.limits-action .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"stop"\f[] .IP "Description:" 4 Action to perform when the image limit is exceeded. .br * "stop": Stop the current extractor run. .br * "wait": Wait for user input. .br * "reset": Spend GP to reset your account's image limits. .SS extractor.exhentai.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Load extended gallery metadata from the \f[I]API\f[]. .br * Adds \f[I]archiver_key\f[], \f[I]posted\f[], and \f[I]torrents\f[] .br * Provides exact \f[I]date\f[] and \f[I]filesize\f[] .SS extractor.exhentai.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-sized original images if available. .SS extractor.exhentai.source .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Description:" 4 Selects an alternative source to download files from. .br * \f[I]"hitomi"\f[]: Download the corresponding gallery from \f[I]hitomi.la\f[] .br * \f[I]"metadata"\f[]: Load only a gallery's metadata from the \f[I]API\f[] .SS extractor.exhentai.tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and provide them as \f[I]tags_\f[] metadata fields, for example \f[I]tags_artist\f[] or \f[I]tags_character\f[]. .SS extractor.facebook.author-followups .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "description:" 4 Extract comments that include photo attachments made by the author of the post. .SS extractor.facebook.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"photos"\f[] .IP "Example:" 4 .br * "avatar,photos" .br * ["avatar", "photos"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Supported values are .br * \f[I]avatar\f[] .br * \f[I]photos\f[] It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.facebook.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Extract and download video & audio separately. .br * \f[I]"ytdl"\f[]: Let \f[I]ytdl\f[] handle video extraction and download, and merge video & audio streams. .br * \f[I]false\f[]: Ignore videos. .SS extractor.fanbox.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. Note: This requires 1 or more additional API requests per post, depending on the number of comments. .SS extractor.fanbox.embeds .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control behavior on embedded content from external sites. .br * \f[I]true\f[]: Extract embed URLs and download them if supported (videos are not downloaded). .br * \f[I]"ytdl"\f[]: Like \f[I]true\f[], but let \f[I]ytdl\f[] handle video extraction and download for YouTube, Vimeo, and SoundCloud embeds. .br * \f[I]false\f[]: Ignore embeds. .SS extractor.fanbox.fee-max .IP "Type:" 6 \f[I]integer\f[] .IP "Description:" 4 Do not request API data or extract files from posts that require a fee (\f[I]feeRequired\f[]) greater than the specified amount. Note: This option has no effect on individual post URLs. .SS extractor.fanbox.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * user,plan,comments .br * ["user", "plan", "comments"] .IP "Description:" 4 Extract \f[I]plan\f[] and extended \f[I]user\f[] metadata. Supported fields when selecting which data to extract are .br * \f[I]comments\f[] .br * \f[I]plan\f[] .br * \f[I]user\f[] Note: \f[I]comments\f[] can also be enabled via \f[I]fanbox.comments\f[] .SS extractor.flickr.access-token & .access-token-secret .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access_token\f[] and \f[I]access_token_secret\f[] values you get from \f[I]linking your Flickr account to gallery-dl\f[]. .SS extractor.flickr.contexts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, return the albums and pools it belongs to as \f[I]set\f[] and \f[I]pool\f[] metadata. Note: This requires 1 additional API call per photo. See \f[I]flickr.photos.getAllContexts\f[] for details. .SS extractor.flickr.exif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, return its EXIF/TIFF/GPS tags as \f[I]exif\f[] and \f[I]camera\f[] metadata. Note: This requires 1 additional API call per photo. See \f[I]flickr.photos.getExif\f[] for details. .SS extractor.flickr.info .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each photo, retrieve its "full" metadata as provided by \f[I]flickr.photos.getInfo\f[] Note: This requires 1 additional API call per photo. .SS extractor.flickr.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * license,last_update,machine_tags .br * ["license", "last_update", "machine_tags"] .IP "Description:" 4 Extract additional metadata (license, date_taken, original_format, last_update, geo, machine_tags, o_dims) It is possible to specify a custom list of metadata includes. See \f[I]the extras parameter\f[] in \f[I]Flickr's API docs\f[] for possible field names. .SS extractor.flickr.profile .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional \f[I]user\f[] profile metadata. Note: This requires 1 additional API call per user profile. See \f[I]flickr.people.getInfo\f[] for details. .SS extractor.flickr.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract and download videos. .SS extractor.flickr.size-max .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Sets the maximum allowed size for downloaded images. .br * If this is an \f[I]integer\f[], it specifies the maximum image dimension (width and height) in pixels. .br * If this is a \f[I]string\f[], it should be one of Flickr's format specifiers (\f[I]"Original"\f[], \f[I]"Large"\f[], ... or \f[I]"o"\f[], \f[I]"k"\f[], \f[I]"h"\f[], \f[I]"l"\f[], ...) to use as an upper limit. .SS extractor.furaffinity.descriptions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"text"\f[] .IP "Description:" 4 Controls the format of \f[I]description\f[] metadata fields. .br * \f[I]"text"\f[]: Plain text with HTML tags removed .br * \f[I]"html"\f[]: Raw HTML content .SS extractor.furaffinity.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs linked in descriptions. .SS extractor.furaffinity.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "scraps,favorite" .br * ["scraps", "favorite"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"gallery"\f[], \f[I]"scraps"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.furaffinity.layout .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects which site layout to expect when parsing posts. .br * \f[I]"auto"\f[]: Automatically differentiate between \f[I]"old"\f[] and \f[I]"new"\f[] .br * \f[I]"old"\f[]: Expect the *old* site layout .br * \f[I]"new"\f[]: Expect the *new* site layout .SS extractor.gelbooru.api-key & .user-id .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Values from the API Access Credentials section found at the bottom of your \f[I]Account Options\f[] page. .SS extractor.gelbooru.favorite.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"desc"\f[] .IP "Description:" 4 Controls the order in which favorited posts are returned. .br * \f[I]"asc"\f[]: Ascending favorite date order (oldest first) .br * \f[I]"desc"\f[]: Descending favorite date order (newest first) .br * \f[I]"reverse"\f[]: Same as \f[I]"asc"\f[] .SS extractor.generic.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Match **all** URLs not otherwise supported by gallery-dl, even ones without a \f[I]generic:\f[] prefix. .SS extractor.gofile.api-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 API token value found at the bottom of your \f[I]profile page\f[]. If not set, a temporary guest token will be used. .SS extractor.gofile.website-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 API token value used during API requests. An invalid or not up-to-date value will result in \f[I]401 Unauthorized\f[] errors. Keeping this option unset will use an extra HTTP request to attempt to fetch the current value used by gofile. .SS extractor.gofile.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.hentaifoundry.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"pictures"\f[] .IP "Example:" 4 .br * "scraps,stories" .br * ["scraps", "stories"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"pictures"\f[], \f[I]"scraps"\f[], \f[I]"stories"\f[], \f[I]"favorite"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.hitomi.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webp"\f[] .IP "Description:" 4 Selects which image format to download. Available formats are \f[I]"webp"\f[] and \f[I]"avif"\f[]. .SS extractor.imagechest.access-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your personal Image Chest access token. These tokens allow using the API instead of having to scrape HTML pages, providing more detailed metadata. (\f[I]date\f[], \f[I]description\f[], etc) See https://imgchest.com/docs/api/1.0/general/authorization for instructions on how to generate such a token. .SS extractor.imgur.client-id .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Custom Client ID value for API requests. .SS extractor.imgur.mp4 .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether to choose the GIF or MP4 version of an animation. .br * \f[I]true\f[]: Follow Imgur's advice and choose MP4 if the \f[I]prefer_video\f[] flag in an image's metadata is set. .br * \f[I]false\f[]: Always choose GIF. .br * \f[I]"always"\f[]: Always choose MP4. .SS extractor.inkbunny.orderby .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"create_datetime"\f[] .IP "Description:" 4 Value of the \f[I]orderby\f[] parameter for submission searches. (See \f[I]API#Search\f[] for details) .SS extractor.instagram.api .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"rest"\f[] .IP "Description:" 4 Selects which API endpoints to use. .br * \f[I]"rest"\f[]: REST API - higher-resolution media .br * \f[I]"graphql"\f[]: GraphQL API - lower-resolution media .SS extractor.instagram.cursor .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 "3414259811154179155_25025320" .IP "Description:" 4 Controls from which position to start the extraction process from. \f[I]true\f[] Start from the beginning. .br Log the most recent \f[I]cursor\f[] value when interrupted before reaching the end. .br \f[I]false\f[] Start from the beginning. any \f[I]string\f[] Start from the position defined by this value. .SS extractor.instagram.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Example:" 4 .br * "stories,highlights,posts" .br * ["stories", "highlights", "posts"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"posts"\f[], \f[I]"reels"\f[], \f[I]"tagged"\f[], \f[I]"stories"\f[], \f[I]"highlights"\f[], \f[I]"info"\f[], \f[I]"avatar"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.instagram.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.instagram.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Provide extended \f[I]user\f[] metadata even when referring to a user by ID, e.g. \f[I]instagram.com/id:12345678\f[]. Note: This metadata is always available when referring to a user by name, e.g. \f[I]instagram.com/USERNAME\f[]. .SS extractor.instagram.order-files .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"asc"\f[] .IP "Description:" 4 Controls the order in which files of each post are returned. .br * \f[I]"asc"\f[]: Same order as displayed in a post .br * \f[I]"desc"\f[]: Reverse order as displayed in a post .br * \f[I]"reverse"\f[]: Same as \f[I]"desc"\f[] Note: This option does *not* affect \f[I]{num}\f[]. To enumerate files in reverse order, use \f[I]count - num + 1\f[]. .SS extractor.instagram.order-posts .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"asc"\f[] .IP "Description:" 4 Controls the order in which posts are returned. .br * \f[I]"asc"\f[]: Same order as displayed .br * \f[I]"desc"\f[]: Reverse order as displayed .br * \f[I]"id"\f[] or \f[I]"id_asc"\f[]: Ascending order by ID .br * \f[I]"id_desc"\f[]: Descending order by ID .br * \f[I]"reverse"\f[]: Same as \f[I]"desc"\f[] Note: This option only affects \f[I]highlights\f[]. .SS extractor.instagram.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video previews. .SS extractor.instagram.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls video download behavior. \f[I]true\f[] \f[I] \f[I]"dash"\f[] \f[] \f[I]"ytdl"\f[] Download videos from \f[I]video_dash_manifest\f[] data using \f[I]ytdl\f[] \f[I]"merged"\f[] Download pre-merged video formats \f[I]false\f[] Do not download videos .SS extractor.instagram.stories.split .IP "Type:" 6 .br * \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Split \f[I]stories\f[] elements into separate posts. .SS extractor.itaku.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "stars,gallery" .br * ["stars", "gallery"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Supported values are .br * \f[I]gallery\f[] .br * \f[I]posts\f[] .br * \f[I]followers\f[] .br * \f[I]following\f[] .br * \f[I]stars\f[] It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.itaku.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.iwara.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["user-images", "user-videos"]\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are .br * \f[I]"user-images"\f[] .br * \f[I]"user-videos"\f[] .br * \f[I]"user-playlists"\f[] It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.kemono.archives .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata for \f[I]archives\f[] files, including \f[I]file\f[], \f[I]file_list\f[], and \f[I]password\f[]. Note: This requires 1 additional HTTP request per \f[I]archives\f[] file. .SS extractor.kemono.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]comments\f[] metadata. Note: This requires 1 additional HTTP request per post. .SS extractor.kemono.duplicates .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Example:" 4 .br * "attachment,inline" .br * ["file", "attachment"] .IP "Description:" 4 Controls how to handle duplicate files in a post. \f[I]true\f[] Download duplicates \f[I]false\f[] Ignore duplicates any \f[I]list\f[] or \f[I]string\f[] Download a duplicate file if its \f[I]type\f[] is in the given list .br Ignore it otherwise .br .SS extractor.kemono.dms .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's direct messages as \f[I]dms\f[] metadata. .SS extractor.kemono.announcements .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract a user's announcements as \f[I]announcements\f[] metadata. .SS extractor.kemono.endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"posts"\f[] .IP "Description:" 4 API endpoint to use for retrieving creator posts. \f[I]"legacy"\f[] Use the results from .br \f[I]/v1/{service}/user/{creator_id}/posts-legacy\f[] Provides less metadata, but is more reliable at returning all posts. .br Supports filtering results by \f[I]tag\f[] query parameter. .br \f[I]"legacy+"\f[] Use the results from .br \f[I]/v1/{service}/user/{creator_id}/posts-legacy\f[] to retrieve post IDs and one request to .br \f[I]/v1/{service}/user/{creator_id}/post/{post_id}\f[] to get a full set of metadata for each. \f[I]"posts"\f[] Use the results from .br \f[I]/v1/{service}/user/{creator_id}\f[] Provides more metadata, but might not return a creator's first/last posts. .br .SS extractor.kemono.favorites .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"artist"\f[] .IP "Description:" 4 Determines the type of favorites to be downloaded. Available types are \f[I]artist\f[], and \f[I]post\f[]. .SS extractor.kemono.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["attachments", "file", "inline"]\f[] .IP "Description:" 4 Determines the type and order of files to be downloaded. Available types are \f[I]file\f[], \f[I]attachments\f[], and \f[I]inline\f[]. .SS extractor.kemono.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Limit the number of posts to download. .SS extractor.kemono.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract \f[I]username\f[] and \f[I]user_profile\f[] metadata. .SS extractor.kemono.revisions .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract post revisions. Set this to \f[I]"unique"\f[] to filter out duplicate revisions. Note: This requires 1 additional HTTP request per post. .SS extractor.kemono.order-revisions .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"desc"\f[] .IP "Description:" 4 Controls the order in which \f[I]revisions\f[] are returned. .br * \f[I]"asc"\f[]: Ascending order (oldest first) .br * \f[I]"desc"\f[]: Descending order (newest first) .br * \f[I]"reverse"\f[]: Same as \f[I]"asc"\f[] .SS extractor.khinsider.covers .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download album cover images. .SS extractor.khinsider.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"mp3"\f[] .IP "Description:" 4 The name of the preferred file format to download. Use \f[I]"all"\f[] to download all available formats, or a (comma-separated) list to select multiple formats. If the selected format is not available, the first in the list gets chosen (usually mp3). .SS extractor.schalenetwork.cbz .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download each gallery as a single \f[I].cbz\f[] file. Disabling this option causes a gallery to be downloaded as individual image files. .SS extractor.schalenetwork.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["0", "1600", "1280", "980", "780"]\f[] .IP "Description:" 4 Name(s) of the image format to download. When more than one format is given, the first available one is selected. Possible formats are .br \f[I]"780"\f[], \f[I]"980"\f[], \f[I]"1280"\f[], \f[I]"1600"\f[], \f[I]"0"\f[] (original) .br .SS extractor.schalenetwork.tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and provide them as \f[I]tags_\f[] metadata fields, for example \f[I]tags_artist\f[] or \f[I]tags_character\f[]. .SS extractor.lolisafe.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Specifies the domain used by a \f[I]lolisafe\f[] extractor regardless of input URL. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.luscious.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.mangadex.api-server .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"https://api.mangadex.org"\f[] .IP "Description:" 4 The server to use for API requests. .SS extractor.mangadex.api-parameters .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"order[updatedAt]": "desc"} .IP "Description:" 4 Additional query parameters to send when fetching manga chapters. (See \f[I]/manga/{id}/feed\f[] and \f[I]/user/follows/manga/feed\f[]) .SS extractor.mangadex.lang .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "en" .br * "fr,it" .br * ["fr", "it"] .IP "Description:" 4 \f[I]ISO 639-1\f[] language codes to filter chapters by. .SS extractor.mangadex.ratings .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["safe", "suggestive", "erotica", "pornographic"]\f[] .IP "Example:" 4 .br * "safe" .br * "erotica,suggestive" .br * ["erotica", "suggestive"] .IP "Description:" 4 List of acceptable content ratings for returned chapters. .SS extractor.mangapark.source .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "koala:en" .br * 15150116 .IP "Description:" 4 Select chapter source and language for a manga. The general syntax is \f[I]":"\f[]. .br Both are optional, meaning \f[I]"koala"\f[], \f[I]"koala:"\f[], \f[I]":en"\f[], .br or even just \f[I]":"\f[] are possible as well. Specifying the numeric \f[I]ID\f[] of a source is also supported. .SS extractor.[mastodon].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]access-token\f[] value you get from \f[I]linking your account to gallery-dl\f[]. Note: gallery-dl comes with built-in tokens for \f[I]mastodon.social\f[], \f[I]pawoo\f[] and \f[I]baraag\f[]. For other instances, you need to obtain an \f[I]access-token\f[] in order to use usernames in place of numerical user IDs. .SS extractor.[mastodon].cards .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from cards. .SS extractor.[mastodon].reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from reblogged posts. .SS extractor.[mastodon].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other posts. .SS extractor.[mastodon].text-posts .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only posts without media content. .SS extractor.[misskey].access-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your access token, necessary to fetch favorited notes. .SS extractor.[misskey].include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"notes"\f[] .IP "Example:" 4 .br * "avatar,background,notes" .br * ["avatar", "background", "notes"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"info"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"notes"\f[], It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.[misskey].renotes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from renoted notes. .SS extractor.[misskey].replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other notes. .SS extractor.[moebooru].pool.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract extended \f[I]pool\f[] metadata. Note: Not supported by all \f[I]moebooru\f[] instances. .SS extractor.naver-blog.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.naver-chzzk.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over comments. .SS extractor.newgrounds.flash .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download original Adobe Flash animations instead of pre-rendered videos. .SS extractor.newgrounds.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]string\f[] .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 .br * "720p" .br * ["mp4", "mov", "1080p", "720p"] .IP "Description:" 4 Selects the preferred format for video downloads. If the selected format is not available, the next smaller one gets chosen. If this is a \f[I]list\f[], try each given filename extension in original resolution or recoded format until an available format is found. .SS extractor.newgrounds.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"art"\f[] .IP "Example:" 4 .br * "movies,audio" .br * ["movies", "audio"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"art"\f[], \f[I]"audio"\f[], \f[I]"games"\f[], \f[I]"movies"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nijie.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"illustration,doujin"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"illustration"\f[], \f[I]"doujin"\f[], \f[I]"favorite"\f[], \f[I]"nuita"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.nitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. .SS extractor.nitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. .SS extractor.nitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]ytdl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.oauth.browser .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls how a user is directed to an OAuth authorization page. .br * \f[I]true\f[]: Use Python's \f[I]webbrowser.open()\f[] method to automatically open the URL in the user's default browser. .br * \f[I]false\f[]: Ask the user to copy & paste an URL from the terminal. .SS extractor.oauth.cache .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Store tokens received during OAuth authorizations in \f[I]cache\f[]. .SS extractor.oauth.host .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"localhost"\f[] .IP "Description:" 4 Host name / IP address to bind to during OAuth authorization. .SS extractor.oauth.port .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]6414\f[] .IP "Description:" 4 Port number to listen on during OAuth authorization. Note: All redirects will go to port \f[I]6414\f[], regardless of the port specified here. You'll have to manually adjust the port number in your browser's address bar when using a different port than the default. .SS extractor.paheal.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (\f[I]source\f[], \f[I]uploader\f[]) Note: This requires 1 additional HTTP request per post. .SS extractor.patreon.cursor .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 "03:eyJ2IjoxLCJjIjoiMzU0NDQ1MjAiLCJ0IjoiIn0=:DTcmjBoVj01o_492YBYqHhqx" .IP "Description:" 4 Controls from which position to start the extraction process from. \f[I]true\f[] Start from the beginning. .br Log the most recent \f[I]cursor\f[] value when interrupted before reaching the end. .br \f[I]false\f[] Start from the beginning. any \f[I]string\f[] Start from the position defined by this value. .SS extractor.patreon.files .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["images", "image_large", "attachments", "postfile", "content"]\f[] .IP "Description:" 4 Determines types and order of files to download. Available types: .br * \f[I]postfile\f[] .br * \f[I]images\f[] .br * \f[I]image_large\f[] .br * \f[I]attachments\f[] .br * \f[I]content\f[] .SS extractor.patreon.format-images .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"download_url"\f[] .IP "Description:" 4 Selects the format of \f[I]images\f[] \f[I]files\f[]. Possible formats: .br * \f[I]download_url\f[] (\f[I]"a":1,"p":1\f[]) .br * \f[I]url\f[] (\f[I]"w":620\f[]) .br * \f[I]original\f[] (\f[I]"q":100,"webp":0\f[]) .br * \f[I]default\f[] (\f[I]"w":620\f[]) .br * \f[I]default_small\f[] (\f[I]"w":360\f[]) .br * \f[I]default_blurred\f[] (\f[I]"w":620\f[]) .br * \f[I]default_blurred_small\f[] (\f[I]"w":360\f[]) .br * \f[I]thumbnail\f[] (\f[I]"h":360,"w":360\f[]) .br * \f[I]thumbnail_large\f[] (\f[I]"h":1080,"w":1080\f[]) .br * \f[I]thumbnail_small\f[] (\f[I]"h":100,"w":100\f[]) .SS extractor.patreon.user.date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Sets the \f[I]Date\f[] to start from. .SS extractor.[philomena].api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your account's API Key, to use your personal browsing settings and filters. .SS extractor.[philomena].filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 :\f[I]derpibooru\f[]: \f[I]56027\f[] (\f[I]Everything\f[] filter) :\f[I]ponybooru\f[]: \f[I]3\f[] (\f[I]Nah.\f[] filter) :otherwise: \f[I]2\f[] .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.[philomena].svg .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download SVG versions of images when available. Try to download the \f[I]view_url\f[] version of these posts when this option is disabled. .SS extractor.pillowfort.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow links to external sites, e.g. Twitter, .SS extractor.pillowfort.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract inline images. .SS extractor.pillowfort.reblogs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract media from reblogged posts. .SS extractor.pinterest.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]pinterest\f[] extractors. Setting this option to \f[I]"auto"\f[] uses the same domain as a given input URL. .SS extractor.pinterest.sections .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include pins from board sections. .SS extractor.pinterest.stories .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Extract files from story pins. .SS extractor.pinterest.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download from video pins. .SS extractor.pixeldrain.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Your account's \f[I]API key\f[] .SS extractor.pixeldrain.recursive .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Recursively download files from subfolders. .SS extractor.pixiv.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"artworks"\f[] .IP "Example:" 4 .br * "avatar,background,artworks" .br * ["avatar", "background", "artworks"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"artworks"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"favorite"\f[], \f[I]"novel-user"\f[], \f[I]"novel-bookmark"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.pixiv.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from running \f[I]gallery-dl oauth:pixiv\f[] (see OAuth_) or by using a third-party tool like \f[I]gppt\f[]. .SS extractor.pixiv.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extended \f[I]user\f[] metadata. .SS extractor.pixiv.metadata-bookmark .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For works bookmarked by \f[I]your own account\f[], fetch bookmark tags as \f[I]tags_bookmark\f[] metadata. Note: This requires 1 additional API request per bookmarked post. .SS extractor.pixiv.captions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For works with seemingly empty \f[I]caption\f[] metadata, try to grab the actual \f[I]caption\f[] value using the AJAX API. .SS extractor.pixiv.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch \f[I]comments\f[] metadata. Note: This requires 1 or more additional API requests per post, depending on the number of comments. .SS extractor.pixiv.work.related .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also download related artworks. .SS extractor.pixiv.tags .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"japanese"\f[] .IP "Description:" 4 Controls the \f[I]tags\f[] metadata field. .br * "japanese": List of Japanese tags .br * "translated": List of translated tags .br * "original": Unmodified list with both Japanese and translated tags .SS extractor.pixiv.ugoira .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download Pixiv's Ugoira animations. These animations come as a \f[I].zip\f[] archive containing all animation frames in JPEG format by default. Set this option to \f[I]"original"\f[] to download them as individual, higher-quality frames. Use an ugoira post processor to convert them to watchable animations. (Example__) .SS extractor.pixiv.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 When downloading galleries, this sets the maximum number of posts to get. A value of \f[I]0\f[] means no limit. .SS extractor.pixiv.sanity .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Try to fetch \f[I]limit_sanity_level\f[] works via web API. .SS extractor.pixiv-novel.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch \f[I]comments\f[] metadata. Note: This requires 1 or more additional API requests per novel, depending on the number of comments. .SS extractor.pixiv-novel.covers .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download cover images. .SS extractor.pixiv-novel.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download embedded images. .SS extractor.pixiv-novel.full-series .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 When downloading a novel being part of a series, download all novels of that series. .SS extractor.pixiv-novel.max-posts .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 When downloading multiple novels, this sets the maximum number of novels to get. A value of \f[I]0\f[] means no limit. .SS extractor.pixiv-novel.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extended \f[I]user\f[] metadata. .SS extractor.pixiv-novel.metadata-bookmark .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For novels bookmarked by \f[I]your own account\f[], fetch bookmark tags as \f[I]tags_bookmark\f[] metadata. Note: This requires 1 additional API request per bookmarked post. .SS extractor.pixiv-novel.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from running \f[I]gallery-dl oauth:pixiv\f[] (see OAuth_) or by using a third-party tool like \f[I]gppt\f[]. This can be the same value as \f[I]extractor.pixiv.refresh-token\f[] .SS extractor.pixiv-novel.tags .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"japanese"\f[] .IP "Description:" 4 Controls the \f[I]tags\f[] metadata field. .br * "japanese": List of Japanese tags .br * "translated": List of translated tags .br * "original": Unmodified list with both Japanese and translated tags .SS extractor.plurk.comments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also search Plurk comments for URLs. .SS extractor.[postmill].save-link-post-body .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Whether or not to save the body for link/image posts. .SS extractor.reactor.gif .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Format in which to download animated images. Use \f[I]true\f[] to download animated images as gifs and \f[I]false\f[] to download as mp4 videos. .SS extractor.readcomiconline.captcha .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"stop"\f[] .IP "Description:" 4 Controls how to handle redirects to CAPTCHA pages. .br * \f[I]"stop\f[]: Stop the current extractor run. .br * \f[I]"wait\f[]: Ask the user to solve the CAPTCHA and wait. .SS extractor.readcomiconline.quality .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Sets the \f[I]quality\f[] query parameter of issue pages. (\f[I]"lq"\f[] or \f[I]"hq"\f[]) \f[I]"auto"\f[] uses the quality parameter of the input URL or \f[I]"hq"\f[] if not present. .SS extractor.reddit.comments .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 The value of the \f[I]limit\f[] parameter when loading a submission and its comments. This number (roughly) specifies the total amount of comments being retrieved with the first API call. Reddit's internal default and maximum values for this parameter appear to be 200 and 500 respectively. The value \f[I]0\f[] ignores all comments and significantly reduces the time required when scanning a subreddit. .SS extractor.reddit.morecomments .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Retrieve additional comments by resolving the \f[I]more\f[] comment stubs in the base comment tree. Note: This requires 1 additional API call for every 100 extra comments. .SS extractor.reddit.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download embedded comments media. .SS extractor.reddit.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]253402210800\f[] (timestamp of \f[I]datetime.max\f[]) .IP "Description:" 4 Ignore all submissions posted before/after this date. .SS extractor.reddit.id-min & .id-max .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "6kmzv2" .IP "Description:" 4 Ignore all submissions posted before/after the submission with this ID. .SS extractor.reddit.previews .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For failed downloads from external URLs / child extractors, download Reddit's preview image/video if available. .SS extractor.reddit.recursion .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Reddit extractors can recursively visit other submissions linked to in the initial set of submissions. This value sets the maximum recursion depth. Special values: .br * \f[I]0\f[]: Recursion is disabled .br * \f[I]-1\f[]: Infinite recursion (don't do this) .SS extractor.reddit.refresh-token .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]refresh-token\f[] value you get from \f[I]linking your Reddit account to gallery-dl\f[]. Using a \f[I]refresh-token\f[] allows you to access private or otherwise not publicly available subreddits, given that your account is authorized to do so, but requests to the reddit API are going to be rate limited at 600 requests every 10 minutes/600 seconds. .SS extractor.reddit.selftext .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 .br * \f[I]true\f[] if \f[I]comments\f[] are enabled .br * \f[I]false\f[] otherwise .IP "Description:" 4 Follow links in the original post's \f[I]selftext\f[]. .SS extractor.reddit.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos and use \f[I]ytdl\f[] to handle HLS and DASH manifests .br * \f[I]"ytdl"\f[]: Download videos and let \f[I]ytdl\f[] handle all of video extraction and download .br * \f[I]"dash"\f[]: Extract DASH manifest URLs and use \f[I]ytdl\f[] to download and merge them. (*) .br * \f[I]false\f[]: Ignore videos (*) This saves 1 HTTP request per video and might potentially be able to download otherwise deleted videos, but it will not always get the best video quality available. .SS extractor.redgifs.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["hd", "sd", "gif"]\f[] .IP "Description:" 4 List of names of the preferred animation format, which can be \f[I]"hd"\f[], \f[I]"sd"\f[], \f[I]"gif"\f[], \f[I]"thumbnail"\f[], \f[I]"vthumbnail"\f[], or \f[I]"poster"\f[]. If a selected format is not available, the next one in the list will be tried until an available format is found. If the format is given as \f[I]string\f[], it will be extended with \f[I]["hd", "sd", "gif"]\f[]. Use a list with one element to restrict it to only one possible format. .SS extractor.rule34xyz.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["10", "40", "41", "2"]\f[] .IP "Example:" 4 "33,34,4" .IP "Description:" 4 Selects the file format to extract. When more than one format is given, the first available one is selected. .SS extractor.sankaku.refresh .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Refresh download URLs before they expire. .SS extractor.sankaku.tags .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and .br provide them as \f[I]tags_TYPE\f[] and \f[I]tag_string_TYPE\f[] metadata fields, for example \f[I]tags_artist\f[] and \f[I]tags_character\f[]. .br \f[I]true\f[] Enable general \f[I]tags\f[] categories Requires: .br * 1 additional API request per 100 tags per post \f[I]"extended"\f[] Group \f[I]tags\f[] by the new, extended tag category system used on \f[I]chan.sankakucomplex.com\f[] Requires: .br * 1 additional HTTP request per post .br * logged-in \f[I]cookies\f[] to fetch full \f[I]tags\f[] category data \f[I]false\f[] Disable \f[I]tags\f[] categories .SS extractor.sankakucomplex.embeds .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download video embeds from external sites. .SS extractor.sankakucomplex.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos. .SS extractor.sexcom.gifs .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download animated images as \f[I].gif\f[] instead of \f[I].webp\f[] .SS extractor.skeb.article .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download article images. .SS extractor.skeb.sent-requests .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download sent requests. .SS extractor.skeb.thumbnails .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download thumbnails. .SS extractor.skeb.search.filters .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["genre:art", "genre:voice", "genre:novel", "genre:video", "genre:music", "genre:correction"]\f[] .IP "Example:" 4 "genre:music OR genre:voice" .IP "Description:" 4 Filters used during searches. .SS extractor.smugmug.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.steamgriddb.animated .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include animated assets when downloading from a list of assets. .SS extractor.steamgriddb.epilepsy .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with epilepsy when downloading from a list of assets. .SS extractor.steamgriddb.dimensions .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"1024x512,512x512"\f[] .br * \f[I]["460x215", "920x430"]\f[] .IP "Description:" 4 Only include assets that are in the specified dimensions. \f[I]all\f[] can be used to specify all dimensions. Valid values are: .br * Grids: \f[I]460x215\f[], \f[I]920x430\f[], \f[I]600x900\f[], \f[I]342x482\f[], \f[I]660x930\f[], \f[I]512x512\f[], \f[I]1024x1024\f[] .br * Heroes: \f[I]1920x620\f[], \f[I]3840x1240\f[], \f[I]1600x650\f[] .br * Logos: N/A (will be ignored) .br * Icons: \f[I]8x8\f[], \f[I]10x10\f[], \f[I]14x14\f[], \f[I]16x16\f[], \f[I]20x20\f[], \f[I]24x24\f[], \f[I]28x28\f[], \f[I]32x32\f[], \f[I]35x35\f[], \f[I]40x40\f[], \f[I]48x48\f[], \f[I]54x54\f[], \f[I]56x56\f[], \f[I]57x57\f[], \f[I]60x60\f[], \f[I]64x64\f[], \f[I]72x72\f[], \f[I]76x76\f[], \f[I]80x80\f[], \f[I]90x90\f[], \f[I]96x96\f[], \f[I]100x100\f[], \f[I]114x114\f[], \f[I]120x120\f[], \f[I]128x128\f[], \f[I]144x144\f[], \f[I]150x150\f[], \f[I]152x152\f[], \f[I]160x160\f[], \f[I]180x180\f[], \f[I]192x192\f[], \f[I]194x194\f[], \f[I]256x256\f[], \f[I]310x310\f[], \f[I]512x512\f[], \f[I]768x768\f[], \f[I]1024x1024\f[] .SS extractor.steamgriddb.file-types .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"png,jpeg"\f[] .br * \f[I]["jpeg", "webp"]\f[] .IP "Description:" 4 Only include assets that are in the specified file types. \f[I]all\f[] can be used to specify all file types. Valid values are: .br * Grids: \f[I]png\f[], \f[I]jpeg\f[], \f[I]jpg\f[], \f[I]webp\f[] .br * Heroes: \f[I]png\f[], \f[I]jpeg\f[], \f[I]jpg\f[], \f[I]webp\f[] .br * Logos: \f[I]png\f[], \f[I]webp\f[] .br * Icons: \f[I]png\f[], \f[I]ico\f[] .SS extractor.steamgriddb.download-fake-png .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download fake PNGs alongside the real file. .SS extractor.steamgriddb.humor .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with humor when downloading from a list of assets. .SS extractor.steamgriddb.languages .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Examples:" 4 .br * \f[I]"en,km"\f[] .br * \f[I]["fr", "it"]\f[] .IP "Description:" 4 Only include assets that are in the specified languages. \f[I]all\f[] can be used to specify all languages. Valid values are \f[I]ISO 639-1\f[] language codes. .SS extractor.steamgriddb.nsfw .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include assets tagged with adult content when downloading from a list of assets. .SS extractor.steamgriddb.sort .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]score_desc\f[] .IP "Description:" 4 Set the chosen sorting method when downloading from a list of assets. Can be one of: .br * \f[I]score_desc\f[] (Highest Score (Beta)) .br * \f[I]score_asc\f[] (Lowest Score (Beta)) .br * \f[I]score_old_desc\f[] (Highest Score (Old)) .br * \f[I]score_old_asc\f[] (Lowest Score (Old)) .br * \f[I]age_desc\f[] (Newest First) .br * \f[I]age_asc\f[] (Oldest First) .SS extractor.steamgriddb.static .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include static assets when downloading from a list of assets. .SS extractor.steamgriddb.styles .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]all\f[] .IP "Examples:" 4 .br * \f[I]white,black\f[] .br * \f[I]["no_logo", "white_logo"]\f[] .IP "Description:" 4 Only include assets that are in the specified styles. \f[I]all\f[] can be used to specify all styles. Valid values are: .br * Grids: \f[I]alternate\f[], \f[I]blurred\f[], \f[I]no_logo\f[], \f[I]material\f[], \f[I]white_logo\f[] .br * Heroes: \f[I]alternate\f[], \f[I]blurred\f[], \f[I]material\f[] .br * Logos: \f[I]official\f[], \f[I]white\f[], \f[I]black\f[], \f[I]custom\f[] .br * Icons: \f[I]official\f[], \f[I]custom\f[] .SS extractor.steamgriddb.untagged .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include untagged assets when downloading from a list of assets. .SS extractor.[szurubooru].username & .token .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Username and login token of your account to access private resources. To generate a token, visit \f[I]/user/USERNAME/list-tokens\f[] and click \f[I]Create Token\f[]. .SS extractor.tenor.format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["gif", "mp4", "webm", "webp"]\f[] .IP "Description:" 4 List of names of the preferred animation format. If a selected format is not available, the next one in the list will be tried until a format is found. Possible formats include .br * \f[I]gif\f[] .br * \f[I]gif_transparent\f[] .br * \f[I]mediumgif\f[] .br * \f[I]gifpreview\f[] .br * \f[I]tinygif\f[] .br * \f[I]tinygif_transparent\f[] .br * \f[I]mp4\f[] .br * \f[I]tinymp4\f[] .br * \f[I]webm\f[] .br * \f[I]webp\f[] .br * \f[I]webp_transparent\f[] .br * \f[I]tinywebp\f[] .br * \f[I]tinywebp_transparent\f[] .SS extractor.tiktok.audio .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls audio download behavior. .br * \f[I]true\f[]: Download audio tracks .br * \f[I]"ytdl"\f[]: Download audio tracks using \f[I]ytdl\f[] .br * \f[I]false\f[]: Ignore audio tracks .SS extractor.tiktok.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download videos using \f[I]ytdl\f[]. .SS extractor.tiktok.user.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download user avatars. .SS extractor.tiktok.user.module .IP "Type:" 6 \f[I]Module\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The \f[I]ytdl\f[] \f[I]Module\f[] to extract posts from a \f[I]tiktok\f[] user profile with. See \f[I]extractor.ytdl.module\f[]. .SS extractor.tiktok.user.tiktok-range .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]""\f[] .IP "Example:" 4 "1-20" .IP "Description:" 4 Range or playlist indices of \f[I]tiktok\f[] user posts to extract. See \f[I]ytdl/playlist_items\f[] for details. .SS extractor.tumblr.avatar .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download blog avatars. .SS extractor.tumblr.date-min & .date-max .IP "Type:" 6 \f[I]Date\f[] .IP "Default:" 9 \f[I]0\f[] and \f[I]null\f[] .IP "Description:" 4 Ignore all posts published before/after this date. .SS extractor.tumblr.external .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Follow external URLs (e.g. from "Link" posts) and try to extract images from them. .SS extractor.tumblr.inline .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Search posts for inline images and videos. .SS extractor.tumblr.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over blog posts. Allows skipping over posts without having to waste API calls. .SS extractor.tumblr.original .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download full-resolution \f[I]photo\f[] and \f[I]inline\f[] images. For each photo with "maximum" resolution (width equal to 2048 or height equal to 3072) or each inline image, use an extra HTTP request to find the URL to its full-resolution version. .SS extractor.tumblr.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"offset"\f[] .IP "Description:" 4 Controls how to paginate over blog posts. .br * \f[I]"api"\f[]: \f[I]next\f[] parameter provided by the API (potentially misses posts due to a \f[I]bug\f[] in Tumblr's API) .br * \f[I]"before"\f[]: timestamp of last post .br * \f[I]"offset"\f[]: post offset number .SS extractor.tumblr.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle exceeding the daily API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .SS extractor.tumblr.reblogs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 .br * \f[I]true\f[]: Extract media from reblogged posts .br * \f[I]false\f[]: Skip reblogged posts .br * \f[I]"same-blog"\f[]: Skip reblogged posts unless the original post is from the same blog .SS extractor.tumblr.posts .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"all"\f[] .IP "Example:" 4 .br * "video,audio,link" .br * ["video", "audio", "link"] .IP "Description:" 4 A (comma-separated) list of post types to extract images, etc. from. Possible types are \f[I]text\f[], \f[I]quote\f[], \f[I]link\f[], \f[I]answer\f[], \f[I]video\f[], \f[I]audio\f[], \f[I]photo\f[], \f[I]chat\f[]. It is possible to use \f[I]"all"\f[] instead of listing all types separately. .SS extractor.tumblr.fallback-delay .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]120.0\f[] .IP "Description:" 4 Number of seconds to wait between retries for fetching full-resolution images. .SS extractor.tumblr.fallback-retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] .IP "Description:" 4 Number of retries for fetching full-resolution images or \f[I]-1\f[] for infinite retries. .SS extractor.twibooru.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Twibooru API Key\f[], to use your account's browsing settings and filters. .SS extractor.twibooru.filter .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]2\f[] (\f[I]Everything\f[] filter) .IP "Description:" 4 The content filter ID to use. Setting an explicit filter ID overrides any default filters and can be used to access 18+ content without \f[I]API Key\f[]. See \f[I]Filters\f[] for details. .SS extractor.twibooru.svg .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download SVG versions of images when available. Try to download the \f[I]view_url\f[] version of these posts when this option is disabled. .SS extractor.twitter.ads .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from promoted Tweets. .SS extractor.twitter.cards .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls how to handle \f[I]Twitter Cards\f[]. .br * \f[I]false\f[]: Ignore cards .br * \f[I]true\f[]: Download image content from supported cards .br * \f[I]"ytdl"\f[]: Additionally download video content from unsupported cards using \f[I]ytdl\f[] .SS extractor.twitter.cards-blacklist .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["summary", "youtube.com", "player:twitch.tv"] .IP "Description:" 4 List of card types to ignore. Possible values are .br * card names .br * card domains .br * \f[I]:\f[] .SS extractor.twitter.conversations .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For input URLs pointing to a single Tweet, e.g. https://twitter.com/i/web/status/, fetch media from all Tweets and replies in this \f[I]conversation \f[]. If this option is equal to \f[I]"accessible"\f[], only download from conversation Tweets if the given initial Tweet is accessible. .SS extractor.twitter.csrf .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"cookies"\f[] .IP "Description:" 4 Controls how to handle Cross Site Request Forgery (CSRF) tokens. .br * \f[I]"auto"\f[]: Always auto-generate a token. .br * \f[I]"cookies"\f[]: Use token given by the \f[I]ct0\f[] cookie if present. .SS extractor.twitter.cursor .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Example:" 4 "1/DAABCgABGVKi5lE___oKAAIYbfYNcxrQLggAAwAAAAIAAA" .IP "Description:" 4 Controls from which position to start the extraction process from. \f[I]true\f[] Start from the beginning. .br Log the most recent \f[I]cursor\f[] value when interrupted before reaching the end. .br \f[I]false\f[] Start from the beginning. any \f[I]string\f[] Start from the position defined by this value. .IP "Note:" 4 A \f[I]cursor\f[] value from one timeline cannot be used with another. .SS extractor.twitter.expand .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 For each Tweet, return *all* Tweets from that initial Tweet's conversation or thread, i.e. *expand* all Twitter threads. Going through a timeline with this option enabled is essentially the same as running \f[I]gallery-dl https://twitter.com/i/web/status/\f[] with enabled \f[I]conversations\f[] option for each Tweet in said timeline. Note: This requires at least 1 additional API call per initial Tweet. .SS extractor.twitter.unavailable .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Try to download media marked as \f[I]Unavailable\f[], e.g. \f[I]Geoblocked\f[] videos. .SS extractor.twitter.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"timeline"\f[] .IP "Example:" 4 .br * "avatar,background,media" .br * ["avatar", "background", "media"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"info"\f[], \f[I]"avatar"\f[], \f[I]"background"\f[], \f[I]"timeline"\f[], \f[I]"tweets"\f[], \f[I]"media"\f[], \f[I]"replies"\f[], \f[I]"likes"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.twitter.transform .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Transform Tweet and User metadata into a simpler, uniform format. .SS extractor.twitter.tweet-endpoint .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Selects the API endpoint used to retrieve single Tweets. .br * \f[I]"restid"\f[]: \f[I]/TweetResultByRestId\f[] - accessible to guest users .br * \f[I]"detail"\f[]: \f[I]/TweetDetail\f[] - more stable .br * \f[I]"auto"\f[]: \f[I]"detail"\f[] when logged in, \f[I]"restid"\f[] otherwise .SS extractor.twitter.size .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["orig", "4096x4096", "large", "medium", "small"]\f[] .IP "Description:" 4 The image version to download. Any entries after the first one will be used for potential \f[I]fallback\f[] URLs. Known available sizes are .br * \f[I]orig\f[] .br * \f[I]large\f[] .br * \f[I]medium\f[] .br * \f[I]small\f[] .br * \f[I]4096x4096\f[] .br * \f[I]900x900\f[] .br * \f[I]360x360\f[] .SS extractor.twitter.logout .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Logout and retry as guest when access to another user's Tweets is blocked. .SS extractor.twitter.pinned .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from pinned Tweets. .SS extractor.twitter.quoted .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from quoted Tweets. If this option is enabled, gallery-dl will try to fetch a quoted (original) Tweet when it sees the Tweet which quotes it. .SS extractor.twitter.ratelimit .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"wait"\f[] .IP "Description:" 4 Selects how to handle exceeding the API rate limit. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until rate limit reset .br * \f[I]"wait:N"\f[]: Wait for \f[I]N\f[] seconds .SS extractor.twitter.relogin .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 When receiving a "Could not authenticate you" error while logged in with \f[I]username & password\f[], refresh the current login session and try to continue from where it left off. .SS extractor.twitter.locked .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"abort"\f[] .IP "Description:" 4 Selects how to handle "account is temporarily locked" errors. .br * \f[I]"abort"\f[]: Raise an error and stop extraction .br * \f[I]"wait"\f[]: Wait until the account is unlocked and retry .SS extractor.twitter.replies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Fetch media from replies to other Tweets. If this value is \f[I]"self"\f[], only consider replies where reply and original Tweet are from the same user. Note: Twitter will automatically expand conversations if you use the \f[I]/with_replies\f[] timeline while logged in. For example, media from Tweets which the user replied to will also be downloaded. It is possible to exclude unwanted Tweets using \f[I]image-filter \f[]. .SS extractor.twitter.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from Retweets. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original Tweets, not the Retweets. .SS extractor.twitter.timeline.strategy .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the strategy / tweet source used for timeline URLs (\f[I]https://twitter.com/USER/timeline\f[]). .br * \f[I]"tweets"\f[]: \f[I]/tweets\f[] timeline + search .br * \f[I]"media"\f[]: \f[I]/media\f[] timeline + search .br * \f[I]"with_replies"\f[]: \f[I]/with_replies\f[] timeline + search .br * \f[I]"auto"\f[]: \f[I]"tweets"\f[] or \f[I]"media"\f[], depending on \f[I]retweets\f[] and \f[I]text-tweets\f[] settings .SS extractor.twitter.text-tweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Also emit metadata for text-only Tweets without media content. This only has an effect with a \f[I]metadata\f[] (or \f[I]exec\f[]) post processor with \f[I]"event": "post"\f[] and appropriate \f[I]filename\f[]. .SS extractor.twitter.twitpic .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract \f[I]TwitPic\f[] embeds. .SS extractor.twitter.unique .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Ignore previously seen Tweets. .SS extractor.twitter.username-alt .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 Alternate Identifier (username, email, phone number) when \f[I]logging in\f[]. When not specified and asked for by Twitter, this identifier will need to entered in an interactive prompt. .SS extractor.twitter.users .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"user"\f[] .IP "Example:" 4 "https://twitter.com/search?q=from:{legacy[screen_name]}" .IP "Description:" 4 Format string for user URLs generated from .br \f[I]following\f[] and \f[I]list-members\f[] queries, whose replacement field values come from Twitter \f[I]user\f[] objects .br (\f[I]Example\f[]) Special values: .br * \f[I]"user"\f[]: \f[I]https://twitter.com/i/user/{rest_id}\f[] .br * \f[I]"timeline"\f[]: \f[I]https://twitter.com/id:{rest_id}/timeline\f[] .br * \f[I]"tweets"\f[]: \f[I]https://twitter.com/id:{rest_id}/tweets\f[] .br * \f[I]"media"\f[]: \f[I]https://twitter.com/id:{rest_id}/media\f[] Note: To allow gallery-dl to follow custom URL formats, set the \f[I]blacklist\f[] for \f[I]twitter\f[] to a non-default value, e.g. an empty string \f[I]""\f[]. .SS extractor.twitter.videos .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Control video download behavior. .br * \f[I]true\f[]: Download videos .br * \f[I]"ytdl"\f[]: Download videos using \f[I]ytdl\f[] .br * \f[I]false\f[]: Skip video Tweets .SS extractor.unsplash.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"raw"\f[] .IP "Description:" 4 Name of the image format to download. Available formats are \f[I]"raw"\f[], \f[I]"full"\f[], \f[I]"regular"\f[], \f[I]"small"\f[], and \f[I]"thumb"\f[]. .SS extractor.vipergirls.domain .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"viper.click"\f[] .IP "Description:" 4 Specifies the domain used by \f[I]vipergirls\f[] extractors. For example \f[I]"viper.click"\f[] if the main domain is blocked or to bypass Cloudflare, .SS extractor.vipergirls.like .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically like posts after downloading their images. Note: Requires \f[I]login\f[] or \f[I]cookies\f[] .SS extractor.vk.offset .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]0\f[] .IP "Description:" 4 Custom \f[I]offset\f[] starting value when paginating over image results. .SS extractor.vsco.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"gallery"\f[] .IP "Example:" 4 .br * "avatar,collection" .br * ["avatar", "collection"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"avatar"\f[], \f[I]"gallery"\f[], \f[I]"spaces"\f[], \f[I]"collection"\f[], It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.vsco.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wallhaven.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Wallhaven API Key\f[], to use your account's browsing settings and default filters when searching. See https://wallhaven.cc/help/api for more information. .SS extractor.wallhaven.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"uploads"\f[] .IP "Example:" 4 .br * "uploads,collections" .br * ["uploads", "collections"] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"uploads"\f[], \f[I]"collections"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.wallhaven.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (tags, uploader) Note: This requires 1 additional HTTP request per post. .SS extractor.weasyl.api-key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Your \f[I]Weasyl API Key\f[], to use your account's browsing settings and filters. .SS extractor.weasyl.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch extra submission metadata during gallery downloads. .br (\f[I]comments\f[], \f[I]description\f[], \f[I]favorites\f[], \f[I]folder_name\f[], .br \f[I]tags\f[], \f[I]views\f[]) Note: This requires 1 additional HTTP request per submission. .SS extractor.webtoons.quality .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .br * \f[I]object\f[] (ext -> type) .IP "Default:" 9 \f[I]"original"\f[] .IP "Example:" 4 .br * 90 .br * "q50" .br * {"jpg": "q80", "jpeg": "q80", "png": false} .IP "Description:" 4 Controls the quality of downloaded files by modifying URLs' \f[I]type\f[] parameter. \f[I]"original"\f[] Download minimally compressed versions of JPG files any \f[I]integer\f[] Use \f[I]"q"\f[] as \f[I]type\f[] parameter for JPEG files any \f[I]string\f[] Use this value as \f[I]type\f[] parameter for JPEG files any \f[I]object\f[] Use the given values as \f[I]type\f[] parameter for URLs with the specified extensions .br - Set a value to \f[I]false\f[] to completely remove these extension's \f[I]type\f[] parameter .br - Omit an extension to leave its URLs unchanged .br .SS extractor.webtoons.banners .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download the active comic's \f[I]banner\f[]. .SS extractor.webtoons.thumbnails .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download the active episode's \f[I]thumbnail\f[]. Useful for creating CBZ archives with actual source thumbnails. .SS extractor.weibo.gifs .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]gif\f[] files. Set this to \f[I]"video"\f[] to download GIFs as video files. .SS extractor.weibo.include .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"feed"\f[] .IP "Description:" 4 A (comma-separated) list of subcategories to include when processing a user profile. Possible values are \f[I]"home"\f[], \f[I]"feed"\f[], \f[I]"videos"\f[], \f[I]"newvideo"\f[], \f[I]"article"\f[], \f[I]"album"\f[]. It is possible to use \f[I]"all"\f[] instead of listing all values separately. .SS extractor.weibo.livephoto .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download \f[I]livephoto\f[] files. .SS extractor.weibo.movies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download \f[I]movie\f[] videos. .SS extractor.weibo.retweets .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Fetch media from retweeted posts. If this value is \f[I]"original"\f[], metadata for these files will be taken from the original posts, not the retweeted posts. .SS extractor.weibo.videos .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Download video files. .SS extractor.wikimedia.limit .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]50\f[] .IP "Description:" 4 Number of results to return in a single API query. The value must be between 10 and 500. .SS extractor.wikimedia.subcategories .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 For \f[I]Category:\f[] pages, recursively descent into subcategories. .SS extractor.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional \f[I]ytdl\f[] options specified as command-line arguments. See \f[I]yt-dlp options\f[] / \f[I]youtube-dl options\f[] .SS extractor.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/yt-dlp/config" .IP "Description:" 4 Location of a \f[I]ytdl\f[] configuration file to load options from. .SS extractor.ytdl.deprecations .IP "Type:" 6 ´´bool´´ .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Allow \f[I]ytdl\f[] to warn about deprecated options and features. .SS extractor.ytdl.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Process URLs otherwise unsupported by gallery-dl with \f[I]ytdl\f[]. .SS extractor.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 Default of the \f[I]ytdl\f[] \f[I]module\f[] used. .br (\f[I]"bestvideo*+bestaudio/best"\f[] for \f[I]yt_dlp\f[], .br \f[I]"bestvideo+bestaudio/best"\f[] for \f[I]youtube_dl\f[]) .IP "Description:" 4 \f[I]ytdl\f[] format selection string. See \f[I]yt-dlp format selection\f[] / \f[I]youtube-dl format selection\f[] .SS extractor.ytdl.generic .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enables the use of \f[I]ytdl's\f[] \f[I]Generic\f[] extractor. Set this option to \f[I]"force"\f[] for the same effect as \f[I]--force-generic-extractor\f[]. .SS extractor.ytdl.generic-category .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 When using \f[I]ytdl's\f[] \f[I]Generic\f[] extractor, change category to \f[I]"ytdl-generic"\f[] and set subcategory to the input URL's domain. .SS extractor.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route \f[I]ytdl's\f[] output through gallery-dl's logging system. Otherwise it will be written directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]extractor.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS extractor.ytdl.module .IP "Type:" 6 \f[I]Module\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "yt-dlp" .br * "/home/user/.local/lib/python3.13/site-packages/youtube_dl" .IP "Description:" 4 The \f[I]ytdl\f[] \f[I]Module\f[] to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS extractor.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. Available options can be found in \f[I]yt-dlp's docstrings\f[] / \f[I]youtube-dl's docstrings\f[] .SS extractor.zerochan.extensions .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]["jpg", "png", "webp", "gif"]\f[] .IP "Example:" 4 .br * "gif" .br * ["webp", "gif", "jpg"} .IP "Description:" 4 List of filename extensions to try when dynamically building download URLs (\f[I]"pagination": "api"\f[] + \f[I]"metadata": false\f[]) .SS extractor.zerochan.metadata .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract additional metadata (date, md5, tags, ...) Note: This requires 1-2 additional HTTP requests per post. .SS extractor.zerochan.pagination .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"api"\f[] .IP "Description:" 4 Controls how to paginate over tag search results. .br * \f[I]"api"\f[]: Use the \f[I]JSON API\f[] (no \f[I]extension\f[] metadata) .br * \f[I]"html"\f[]: Parse HTML pages (limited to 100 pages * 24 posts) .SS extractor.zerochan.redirects .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Automatically follow tag redirects. .SS extractor.[booru].tags .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Group \f[I]tags\f[] by type and provide them as \f[I]tags_\f[] metadata fields, for example \f[I]tags_artist\f[] or \f[I]tags_character\f[]. Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].notes .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Extract overlay notes (position and text). Note: This requires 1 additional HTTP request per post. .SS extractor.[booru].url .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file_url"\f[] .IP "Example:" 4 .br * "preview_url" .br * ["sample_url", "preview_url", "file_url"] .IP "Description:" 4 Alternate field name to retrieve download URLs from. When multiple names are given, download the first available one. .SS extractor.[manga-extractor].chapter-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Reverse the order of chapter URLs extracted from manga pages. .br * \f[I]true\f[]: Start with the latest chapter .br * \f[I]false\f[]: Start with the first chapter .SS extractor.[manga-extractor].page-reverse .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Download manga chapter pages in reverse order. .SH DOWNLOADER OPTIONS .SS downloader.*.enabled .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Enable/Disable this downloader module. .SS downloader.*.filesize-min & .filesize-max .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "32000", "500k", "2.5M" .IP "Description:" 4 Minimum/Maximum allowed file size in bytes. Any file smaller/larger than this limit will not be downloaded. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.*.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Use \f[I]Last-Modified\f[] HTTP response headers to set file modification times. .SS downloader.*.part .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the use of \f[I].part\f[] files during file downloads. .br * \f[I]true\f[]: Write downloaded data into \f[I].part\f[] files and rename them upon download completion. This mode additionally supports resuming incomplete downloads. .br * \f[I]false\f[]: Do not use \f[I].part\f[] files and write data directly into the actual output files. .SS downloader.*.part-directory .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 Alternate location for \f[I].part\f[] files. Missing directories will be created as needed. If this value is \f[I]null\f[], \f[I].part\f[] files are going to be stored alongside the actual output files. .SS downloader.*.progress .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]3.0\f[] .IP "Description:" 4 Number of seconds until a download progress indicator for the current download is displayed. Set this option to \f[I]null\f[] to disable this indicator. .SS downloader.*.rate .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] with 2 \f[I]strings\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "32000" .br * "500k" .br * "1M - 2.5M" .br * ["1M", "2.5M"] .IP "Description:" 4 Maximum download rate in bytes per second. Possible values are valid integer or floating-point numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. If given as a range, the maximum download rate will be randomly chosen before each download. (see \f[I]random.randint()\f[]) .SS downloader.*.retries .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]extractor.*.retries\f[] .IP "Description:" 4 Maximum number of retries during file downloads, or \f[I]-1\f[] for infinite retries. .SS downloader.*.timeout .IP "Type:" 6 \f[I]float\f[] .IP "Default:" 9 \f[I]extractor.*.timeout\f[] .IP "Description:" 4 Connection timeout during file downloads. .SS downloader.*.verify .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]extractor.*.verify\f[] .IP "Description:" 4 Certificate validation during file downloads. .SS downloader.*.proxy .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (scheme -> proxy) .IP "Default:" 9 \f[I]extractor.*.proxy\f[] .IP "Description:" 4 Proxy server used for file downloads. Disable the use of a proxy for file downloads by explicitly setting this option to \f[I]null\f[]. .SS downloader.http.adjust-extensions .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check file headers of downloaded files and adjust their filename extensions if they do not match. For example, this will change the filename extension (\f[I]{extension}\f[]) of a file called \f[I]example.png\f[] from \f[I]png\f[] to \f[I]jpg\f[] when said file contains JPEG/JFIF data. .SS downloader.http.consume-content .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls the behavior when an HTTP response is considered unsuccessful If the value is \f[I]true\f[], consume the response body. This avoids closing the connection and therefore improves connection reuse. If the value is \f[I]false\f[], immediately close the connection without reading the response. This can be useful if the server is known to send large bodies for error responses. .SS downloader.http.chunk-size .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]32768\f[] .IP "Example:" 4 "50k", "0.8M" .IP "Description:" 4 Number of bytes per downloaded chunk. Possible values are integer numbers optionally followed by one of \f[I]k\f[], \f[I]m\f[]. \f[I]g\f[], \f[I]t\f[], or \f[I]p\f[]. These suffixes are case-insensitive. .SS downloader.http.headers .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 {"Accept": "image/webp,*/*", "Referer": "https://example.org/"} .IP "Description:" 4 Additional HTTP headers to send when downloading files, .SS downloader.http.retry-codes .IP "Type:" 6 \f[I]list\f[] of \f[I]integers\f[] .IP "Default:" 9 \f[I]extractor.*.retry-codes\f[] .IP "Description:" 4 Additional \f[I]HTTP response status codes\f[] to retry a download on. Codes \f[I]200\f[], \f[I]206\f[], and \f[I]416\f[] (when resuming a \f[I]partial\f[] download) will never be retried and always count as success, regardless of this option. \f[I]5xx\f[] codes (server error responses) will always be retried, regardless of this option. .SS downloader.http.sleep-429 .IP "Type:" 6 \f[I]Duration\f[] .IP "Default:" 9 \f[I]extractor.*.sleep-429\f[] .IP "Description:" 4 Number of seconds to sleep when receiving a 429 Too Many Requests response before \f[I]retrying\f[] the request. Note: Requires \f[I]retry-codes\f[] to include \f[I]429\f[]. .SS downloader.http.validate .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check for invalid responses. Fail a download when a file does not pass instead of downloading a potentially broken file. .SS downloader.http.validate-html .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Check for unexpected HTML responses. Fail file downloads with a \f[I]text/html\f[] \f[I]Content-Type header\f[] when expecting a media file instead. .SS downloader.ytdl.cmdline-args .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "--quiet --write-sub --merge-output-format mkv" .br * ["--quiet", "--write-sub", "--merge-output-format", "mkv"] .IP "Description:" 4 Additional \f[I]ytdl\f[] options specified as command-line arguments. See \f[I]yt-dlp options\f[] / \f[I]youtube-dl options\f[] .SS downloader.ytdl.config-file .IP "Type:" 6 \f[I]Path\f[] .IP "Example:" 4 "~/.config/yt-dlp/config" .IP "Description:" 4 Location of a \f[I]ytdl\f[] configuration file to load options from. .SS downloader.ytdl.deprecations .IP "Type:" 6 ´´bool´´ .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Allow \f[I]ytdl\f[] to warn about deprecated options and features. .SS downloader.ytdl.format .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 Default of the \f[I]ytdl\f[] \f[I]module\f[] used. .br (\f[I]"bestvideo*+bestaudio/best"\f[] for \f[I]yt_dlp\f[], .br \f[I]"bestvideo+bestaudio/best"\f[] for \f[I]youtube_dl\f[]) .IP "Description:" 4 \f[I]ytdl\f[] format selection string. See \f[I]yt-dlp format selection\f[] / \f[I]youtube-dl format selection\f[] .SS downloader.ytdl.forward-cookies .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Forward gallery-dl's cookies to \f[I]ytdl\f[]. .SS downloader.ytdl.logging .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Route \f[I]ytdl's\f[] output through gallery-dl's logging system. Otherwise it will be written directly to stdout/stderr. Note: Set \f[I]quiet\f[] and \f[I]no_warnings\f[] in \f[I]downloader.ytdl.raw-options\f[] to \f[I]true\f[] to suppress all output. .SS downloader.ytdl.module .IP "Type:" 6 \f[I]Module\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "yt-dlp" .br * "/home/user/.local/lib/python3.13/site-packages/youtube_dl" .IP "Description:" 4 The \f[I]ytdl\f[] \f[I]Module\f[] to import. Setting this to \f[I]null\f[] will try to import \f[I]"yt_dlp"\f[] followed by \f[I]"youtube_dl"\f[] as fallback. .SS downloader.ytdl.outtmpl .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Description:" 4 The Output Template used to generate filenames for files downloaded with \f[I]ytdl\f[]. See \f[I]yt-dlp output template\f[] / \f[I]youtube-dl output template\f[]. Special values: .br * \f[I]null\f[]: generate filenames with \f[I]extractor.*.filename\f[] .br * \f[I]"default"\f[]: use \f[I]ytdl's\f[] default, currently \f[I]"%(title)s [%(id)s].%(ext)s"\f[] for \f[I]yt-dlp\f[] / \f[I]"%(title)s-%(id)s.%(ext)s"\f[] for \f[I]youtube-dl\f[] Note: An output template other than \f[I]null\f[] might cause unexpected results in combination with certain options (e.g. \f[I]"skip": "enumerate"\f[]) .SS downloader.ytdl.raw-options .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "quiet": true, "writesubtitles": true, "merge_output_format": "mkv" } .IP "Description:" 4 Additional options passed directly to the \f[I]YoutubeDL\f[] constructor. Available options can be found in \f[I]yt-dlp's docstrings\f[] / \f[I]youtube-dl's docstrings\f[] .SH OUTPUT OPTIONS .SS output.mode .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (key -> format string) .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the output string format and status indicators. .br * \f[I]"null"\f[]: No output .br * \f[I]"pipe"\f[]: Suitable for piping to other processes or files .br * \f[I]"terminal"\f[]: Suitable for the standard Windows console .br * \f[I]"color"\f[]: Suitable for terminals that understand ANSI escape codes and colors .br * \f[I]"auto"\f[]: \f[I]"pipe"\f[] if not on a TTY, \f[I]"terminal"\f[] on Windows with \f[I]output.ansi\f[] disabled, \f[I]"color"\f[] otherwise. It is possible to use custom output format strings .br by setting this option to an \f[I]object\f[] and specifying \f[I]start\f[], \f[I]success\f[], \f[I]skip\f[], \f[I]progress\f[], and \f[I]progress-total\f[]. .br For example, the following will replicate the same output as \f[I]mode: color\f[]: .. code:: json { "start" : "{}", "success": "\\r\\u001b[1;32m{}\\u001b[0m\\n", "skip" : "\\u001b[2m{}\\u001b[0m\\n", "progress" : "\\r{0:>7}B {1:>7}B/s ", "progress-total": "\\r{3:>3}% {0:>7}B {1:>7}B/s " } \f[I]start\f[], \f[I]success\f[], and \f[I]skip\f[] are used to output the current filename, where \f[I]{}\f[] or \f[I]{0}\f[] is replaced with said filename. If a given format string contains printable characters other than that, their number needs to be specified as \f[I][, ]\f[] to get the correct results for \f[I]output.shorten\f[]. For example .. code:: json "start" : [12, "Downloading {}"] \f[I]progress\f[] and \f[I]progress-total\f[] are used when displaying the .br \f[I]download progress indicator\f[], \f[I]progress\f[] when the total number of bytes to download is unknown, .br \f[I]progress-total\f[] otherwise. For these format strings .br * \f[I]{0}\f[] is number of bytes downloaded .br * \f[I]{1}\f[] is number of downloaded bytes per second .br * \f[I]{2}\f[] is total number of bytes .br * \f[I]{3}\f[] is percent of bytes downloaded to total bytes .SS output.stdout & .stdin & .stderr .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] .IP "Example:" 4 .. code:: json "utf-8" .. code:: json { "encoding": "utf-8", "errors": "replace", "line_buffering": true } .IP "Description:" 4 \f[I]Reconfigure\f[] a \f[I]standard stream\f[]. Possible options are .br * \f[I]encoding\f[] .br * \f[I]errors\f[] .br * \f[I]newline\f[] .br * \f[I]line_buffering\f[] .br * \f[I]write_through\f[] When this option is specified as a simple \f[I]string\f[], it is interpreted as \f[I]{"encoding": "", "errors": "replace"}\f[] Note: \f[I]errors\f[] always defaults to \f[I]"replace"\f[] .SS output.shorten .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls whether the output strings should be shortened to fit on one console line. Set this option to \f[I]"eaw"\f[] to also work with east-asian characters with a display width greater than 1. .SS output.colors .IP "Type:" 6 \f[I]object\f[] (key -> ANSI color) .IP "Default:" 9 .. code:: json { "success": "1;32", "skip" : "2", "debug" : "0;37", "info" : "1;37", "warning": "1;33", "error" : "1;31" } .IP "Description:" 4 Controls the \f[I]ANSI colors\f[] used for various outputs. Output for \f[I]mode: color\f[] .br * \f[I]success\f[]: successfully downloaded files .br * \f[I]skip\f[]: skipped files Logging Messages: .br * \f[I]debug\f[]: debug logging messages .br * \f[I]info\f[]: info logging messages .br * \f[I]warning\f[]: warning logging messages .br * \f[I]error\f[]: error logging messages .SS output.ansi .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 On Windows, enable ANSI escape sequences and colored output .br by setting the \f[I]ENABLE_VIRTUAL_TERMINAL_PROCESSING\f[] flag for stdout and stderr. .br .SS output.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Show skipped file downloads. .SS output.fallback .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Include fallback URLs in the output of \f[I]-g/--get-urls\f[]. .SS output.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore, in the output of \f[I]-K/--list-keywords\f[] and \f[I]-j/--dump-json\f[]. .SS output.progress .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Controls the progress indicator when *gallery-dl* is run with multiple URLs as arguments. .br * \f[I]true\f[]: Show the default progress indicator (\f[I]"[{current}/{total}] {url}"\f[]) .br * \f[I]false\f[]: Do not show any progress indicator .br * Any \f[I]string\f[]: Show the progress indicator using this as a custom \f[I]format string\f[]. Possible replacement keys are \f[I]current\f[], \f[I]total\f[] and \f[I]url\f[]. .SS output.log .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Logging Configuration\f[] .IP "Default:" 9 \f[I]"[{name}][{levelname}] {message}"\f[] .IP "Description:" 4 Configuration for logging output to stderr. If this is a simple \f[I]string\f[], it specifies the format string for logging messages. .SS output.logfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write logging output to. .SS output.unsupportedfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write external URLs unsupported by *gallery-dl* to. The default format string here is \f[I]"{message}"\f[]. .SS output.errorfile .IP "Type:" 6 .br * \f[I]Path\f[] .br * \f[I]Logging Configuration\f[] .IP "Description:" 4 File to write input URLs which returned an error to. The default format string here is also \f[I]"{message}"\f[]. When combined with \f[I]-I\f[]/\f[I]--input-file-comment\f[] or \f[I]-x\f[]/\f[I]--input-file-delete\f[], this option will cause *all* input URLs from these files to be commented/deleted after processing them and not just successful ones. .SS output.num-to-str .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Convert numeric values (\f[I]integer\f[] or \f[I]float\f[]) to \f[I]string\f[] before outputting them as JSON. .SH POSTPROCESSOR OPTIONS .SS classify.mapping .IP "Type:" 6 \f[I]object\f[] (directory -> extensions) .IP "Default:" 9 .. code:: json { "Pictures" : ["jpg", "jpeg", "png", "gif", "bmp", "svg", "webp", "avif", "heic", "heif", "ico", "psd"], "Video" : ["flv", "ogv", "avi", "mp4", "mpg", "mpeg", "3gp", "mkv", "webm", "vob", "wmv", "m4v", "mov"], "Music" : ["mp3", "aac", "flac", "ogg", "wma", "m4a", "wav"], "Archives" : ["zip", "rar", "7z", "tar", "gz", "bz2"], "Documents": ["txt", "pdf"] } .IP "Description:" 4 A mapping from directory names to filename extensions that should be stored in them. Files with an extension not listed will be ignored and stored in their default location. .SS compare.action .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"replace"\f[] .IP "Description:" 4 The action to take when files do **not** compare as equal. .br * \f[I]"replace"\f[]: Replace/Overwrite the old version with the new one .br * \f[I]"enumerate"\f[]: Add an enumeration index to the filename of the new version like \f[I]skip = "enumerate"\f[] .SS compare.equal .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"null"\f[] .IP "Description:" 4 The action to take when files do compare as equal. .br * \f[I]"abort:N"\f[]: Stop the current extractor run after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"terminate:N"\f[]: Stop the current extractor run, including parent extractors, after \f[I]N\f[] consecutive files compared as equal. .br * \f[I]"exit:N"\f[]: Exit the program after \f[I]N\f[] consecutive files compared as equal. .SS compare.shallow .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Only compare file sizes. Do not read and compare their content. .SS directory.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"prepare"\f[] .IP "Description:" 4 The event(s) for which \f[I]directory\f[] format strings are (re)evaluated. See \f[I]metadata.event\f[] for a list of available events. .SS exec.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Description:" 4 Database to store IDs of executed commands in, similar to \f[I]extractor.*.archive\f[]. The following archive options are also supported: .br * \f[I]archive-format\f[] .br * \f[I]archive-prefix\f[] .br * \f[I]archive-pragma\f[] .br * \f[I]archive-table \f[] .SS exec.async .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Controls whether to wait for a subprocess to finish or to let it run asynchronously. .SS exec.command .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "convert {} {}.png && rm {}" .br * ["echo", "{user[account]}", "{id}"] .IP "Description:" 4 The command to run. .br * If this is a \f[I]string\f[], it will be executed using the system's shell, e.g. \f[I]/bin/sh\f[]. Any \f[I]{}\f[] will be replaced with the full path of a file or target directory, depending on \f[I]exec.event\f[] .br * If this is a \f[I]list\f[], the first element specifies the program name and any further elements its arguments. Each element of this list is treated as a \f[I]format string\f[] using the files' metadata as well as \f[I]{_path}\f[], \f[I]{_directory}\f[], and \f[I]{_filename}\f[]. .SS exec.commands .IP "Type:" 6 \f[I]list\f[] of \f[I]commands\f[] .IP "Example:" 4 .. code:: json [ ["echo", "{user[account]}", "{id}"] ["magick", "convert" "{_path}", "\\fF {_path.rpartition('.')[0]}.png"], "rm {}", ] .IP "Description:" 4 Multiple \f[I]commands\f[] to run in succession. All \f[I]commands\f[] after the first returning with a non-zero exit status will not be run. .SS exec.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"after"\f[] .IP "Description:" 4 The event(s) for which \f[I]exec.command\f[] is run. See \f[I]metadata.event\f[] for a list of available events. .SS exec.session .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Start subprocesses in a new session. On Windows, this means passing \f[I]CREATE_NEW_PROCESS_GROUP\f[] as a \f[I]creationflags\f[] argument to \f[I]subprocess.Popen\f[] On POSIX systems, this means enabling the \f[I]start_new_session\f[] argument of \f[I]subprocess.Popen\f[] to have it call \f[I]setsid()\f[]. .SS hash.chunk-size .IP "Type:" 6 \f[I]integer\f[] .IP "Default:" 9 \f[I]32768\f[] .IP "Description:" 4 Number of bytes read per chunk during file hash computation. .SS hash.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]file hashes\f[] are computed. See \f[I]metadata.event\f[] for a list of available events. .SS hash.filename .IP "Type:" 6 .br * \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Rebuild \f[I]filenames\f[] after computing \f[I]hash digests\f[] and adding them to the metadata dict. .SS hash.hashes .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]object\f[] (field name -> hash algorithm) .IP "Default:" 9 \f[I]"md5,sha1"\f[] .IP "Example:" 4 .. code:: json "sha256:hash_sha,sha3_512:hash_sha3" .. code:: json { "hash_sha" : "sha256", "hash_sha3": "sha3_512" } .IP "Description:" 4 Hash digests to compute. For a list of available hash algorithms, run .. code:: python -c "import hashlib; print('\\n'.join(hashlib.algorithms_available))" or see \f[I]python/hashlib\f[]. .br * If this is a \f[I]string\f[], it is parsed as a a comma-separated list of algorthm-fieldname pairs: .. code:: [ ":"] ["," ...] When \f[I]\f[] is omitted, \f[I]\f[] is used as algorithm name. .br * If this is an \f[I]object\f[], it is a \f[I]\f[] to \f[I]\f[] mapping for hash digests to compute. .SS metadata.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] .IP "Description:" 4 Selects how to process metadata. .br * \f[I]"json"\f[]: write metadata using \f[I]json.dump()\f[] .br * \f[I]"jsonl"\f[]: write metadata in \f[I]JSON Lines \f[] format .br * \f[I]"tags"\f[]: write \f[I]tags\f[] separated by newlines .br * \f[I]"custom"\f[]: write the result of applying \f[I]metadata.content-format\f[] to a file's metadata dictionary .br * \f[I]"modify"\f[]: add or modify metadata entries .br * \f[I]"delete"\f[]: remove metadata entries .SS metadata.filename .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 "{id}.data.json" .IP "Description:" 4 A \f[I]format string\f[] to build the filenames for metadata files with. (see \f[I]extractor.filename\f[]) Using \f[I]"-"\f[] as filename will write all output to \f[I]stdout\f[]. If this option is set, \f[I]metadata.extension\f[] and \f[I]metadata.extension-format\f[] will be ignored. .SS metadata.directory .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"."\f[] .IP "Example:" 4 .br * "metadata" .br * ["..", "metadata", "\\fF {id // 500 * 500}"] .IP "Description:" 4 Directory where metadata files are stored in relative to \f[I]metadata.base-directory\f[]. .SS metadata.base-directory .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]Path\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Selects the relative location for metadata files. .br * \f[I]false\f[]: current target location for file downloads (\f[I]base-directory\f[] + directory_) .br * \f[I]true\f[]: current \f[I]base-directory\f[] location .br * any \f[I]Path\f[]: custom location .SS metadata.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"json"\f[] or \f[I]"txt"\f[] .IP "Description:" 4 Filename extension for metadata files that will be appended to the original file names. .SS metadata.extension-format .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "{extension}.json" .br * "json" .IP "Description:" 4 Custom format string to build filename extensions for metadata files with, which will replace the original filename extensions. Note: \f[I]metadata.extension\f[] is ignored if this option is set. .SS metadata.metadata-path .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 "_meta_path" .IP "Description:" 4 Insert the path of generated files into metadata dictionaries as the given name. .SS metadata.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Example:" 4 .br * "prepare,file,after" .br * ["prepare-after", "skip"] .IP "Description:" 4 The event(s) for which metadata gets written to a file. Available events are: \f[I]init\f[] After post processor initialization and before the first file download \f[I]finalize\f[] On extractor shutdown, e.g. after all files were downloaded \f[I]finalize-success\f[] On extractor shutdown when no error occurred \f[I]finalize-error\f[] On extractor shutdown when at least one error occurred \f[I]prepare\f[] Before a file download \f[I]prepare-after\f[] Before a file download, but after building and checking file paths \f[I]file\f[] When completing a file download, but before it gets moved to its target location \f[I]after\f[] After a file got moved to its target location \f[I]skip\f[] When skipping a file download \f[I]error\f[] After a file download failed \f[I]post\f[] When starting to download all files of a post, e.g. a Tweet on Twitter or a post on Patreon. \f[I]post-after\f[] After downloading all files of a post .SS metadata.include .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["id", "width", "height", "description"] .IP "Description:" 4 Include only the given top-level keys when writing JSON data. Note: Missing or undefined fields will be silently ignored. .SS metadata.exclude .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["blocked", "watching", "status"] .IP "Description:" 4 Exclude all given keys from written JSON data. Note: Cannot be used with \f[I]metadata.include\f[]. .SS metadata.fields .IP "Type:" 6 .br * \f[I]list\f[] of \f[I]strings\f[] .br * \f[I]object\f[] (field name -> \f[I]format string\f[]) .IP "Example:" 4 .. code:: json ["blocked", "watching", "status[creator][name]"] .. code:: json { "blocked" : "***", "watching" : "\\fE 'yes' if watching else 'no'", "status[username]": "{status[creator][name]!l}" } .IP "Description:" 4 .br * \f[I]"mode": "delete"\f[]: A list of metadata field names to remove. .br * \f[I]"mode": "modify"\f[]: An object with metadata field names mapping to a \f[I]format string\f[] whose result is assigned to said field name. .SS metadata.content-format .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "tags:\\n\\n{tags:J\\n}\\n" .br * ["tags:", "", "{tags:J\\n}"] .IP "Description:" 4 Custom format string to build the content of metadata files with. Note: Only applies for \f[I]"mode": "custom"\f[]. .SS metadata.ascii .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Escape all non-ASCII characters. See the \f[I]ensure_ascii\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.indent .IP "Type:" 6 .br * \f[I]integer\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]4\f[] .IP "Description:" 4 Indentation level of JSON output. See the \f[I]indent\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[]. .SS metadata.separators .IP "Type:" 6 \f[I]list\f[] with two \f[I]string\f[] elements .IP "Default:" 9 \f[I][", ", ": "]\f[] .IP "Description:" 4 \f[I]\f[] - \f[I]\f[] pair to separate JSON keys and values with. See the \f[I]separators\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.sort .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Sort output by key. See the \f[I]sort_keys\f[] argument of \f[I]json.dump()\f[] for further details. Note: Only applies for \f[I]"mode": "json"\f[] and \f[I]"jsonl"\f[]. .SS metadata.open .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"w"\f[] .IP "Description:" 4 The \f[I]mode\f[] in which metadata files get opened. For example, use \f[I]"a"\f[] to append to a file's content or \f[I]"w"\f[] to truncate it. See the \f[I]mode\f[] argument of \f[I]open()\f[] for further details. .SS metadata.encoding .IP "Type:" 6 \f[I]string\f[] .IP "Defsult:" 4 \f[I]"utf-8"\f[] .IP "Description:" 4 Name of the encoding used to encode a file's content. See the \f[I]encoding\f[] argument of \f[I]open()\f[] for further details. .SS metadata.private .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Include private fields, i.e. fields whose name starts with an underscore. .SS metadata.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Do not overwrite already existing files. .SS metadata.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Description:" 4 Database to store IDs of generated metadata files in, similar to \f[I]extractor.*.archive\f[]. The following archive options are also supported: .br * \f[I]archive-format\f[] .br * \f[I]archive-prefix\f[] .br * \f[I]archive-pragma\f[] .br * \f[I]archive-table \f[] .SS metadata.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Set modification times of generated metadata files according to the accompanying downloaded file. Enabling this option will only have an effect *if* there is actual \f[I]mtime\f[] metadata available, that is .br * after a file download (\f[I]"event": "file"\f[] (default), \f[I]"event": "after"\f[]) .br * when running *after* an \f[I]mtime\f[] post processes for the same \f[I]event\f[] For example, a \f[I]metadata\f[] post processor for \f[I]"event": "post"\f[] will *not* be able to set its file's modification time unless an \f[I]mtime\f[] post processor with \f[I]"event": "post"\f[] runs *before* it. .SS mtime.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]mtime.key\f[] or \f[I]mtime.value\f[] get evaluated. See \f[I]metadata.event\f[] for a list of available events. .SS mtime.key .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"date"\f[] .IP "Description:" 4 Name of the metadata field whose value should be used. This value must be either a UNIX timestamp or a \f[I]datetime\f[] object. Note: This option gets ignored if \f[I]mtime.value\f[] is set. .SS mtime.value .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 .br * "{status[date]}" .br * "{content[0:6]:R22/2022/D%Y%m%d/}" .IP "Description:" 4 A \f[I]format string\f[] whose value should be used. The resulting value must be either a UNIX timestamp or a \f[I]datetime\f[] object. .SS python.archive .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Description:" 4 Database to store IDs of called Python functions in, similar to \f[I]extractor.*.archive\f[]. The following archive options are also supported: .br * \f[I]archive-format\f[] .br * \f[I]archive-prefix\f[] .br * \f[I]archive-pragma\f[] .br * \f[I]archive-table \f[] .SS python.event .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]"file"\f[] .IP "Description:" 4 The event(s) for which \f[I]python.function\f[] gets called. See \f[I]metadata.event\f[] for a list of available events. .SS python.function .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "my_module:generate_text" .br * "~/.local/share/gdl_utils.py:resize" .IP "Description:" 4 The Python function to call. This function is specified as \f[I]:\f[], where .br \f[I]\f[] is a \f[I]Module\f[] and .br \f[I]\f[] is the name of the function in that module. It gets called with the current metadata dict as argument. .SS rename.from .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]format string\f[] for filenames to rename. When no value is given, \f[I]extractor.*.filename\f[] is used. .SS rename.to .IP "Type:" 6 \f[I]string\f[] .IP "Description:" 4 The \f[I]format string\f[] for target filenames. When no value is given, \f[I]extractor.*.filename\f[] is used. .SS rename.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Do not rename a file when another file with the target name already exists. .SS ugoira.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"webm"\f[] .IP "Description:" 4 Filename extension for the resulting video files. .SS ugoira.ffmpeg-args .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 \f[I]null\f[] .IP "Example:" 4 ["-c:v", "libvpx-vp9", "-an", "-b:v", "2M"] .IP "Description:" 4 Additional \f[I]ffmpeg\f[] command-line arguments. .SS ugoira.ffmpeg-demuxer .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]auto\f[] .IP "Description:" 4 \f[I]ffmpeg\f[] demuxer to read and process input files with. Possible values are .br * "\f[I]concat\f[]" (inaccurate frame timecodes for non-uniform frame delays) .br * "\f[I]image2\f[]" (accurate timecodes, requires nanosecond file timestamps, i.e. no Windows or macOS) .br * "mkvmerge" (accurate timecodes, only WebM or MKV, requires \f[I]mkvmerge\f[]) .br * "archive" (store "original" frames in a \f[I].zip\f[] archive) "auto" will select mkvmerge if available and fall back to concat otherwise. .SS ugoira.ffmpeg-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"ffmpeg"\f[] .IP "Description:" 4 Location of the \f[I]ffmpeg\f[] (or \f[I]avconv\f[]) executable to use. .SS ugoira.mkvmerge-location .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 \f[I]"mkvmerge"\f[] .IP "Description:" 4 Location of the \f[I]mkvmerge\f[] executable for use with the \f[I]mkvmerge demuxer\f[]. .SS ugoira.ffmpeg-output .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]"error"\f[] .IP "Description:" 4 Controls \f[I]ffmpeg\f[] output. .br * \f[I]true\f[]: Enable \f[I]ffmpeg\f[] output .br * \f[I]false\f[]: Disable all \f[I]ffmpeg\f[] output .br * any \f[I]string\f[]: Pass \f[I]-hide_banner\f[] and \f[I]-loglevel\f[] with this value as argument to \f[I]ffmpeg\f[] .SS ugoira.ffmpeg-twopass .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Enable Two-Pass encoding. .SS ugoira.framerate .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"auto"\f[] .IP "Description:" 4 Controls the frame rate argument (\f[I]-r\f[]) for \f[I]ffmpeg\f[] .br * \f[I]"auto"\f[]: Automatically assign a fitting frame rate based on delays between frames. .br * \f[I]"uniform"\f[]: Like \f[I]auto\f[], but assign an explicit frame rate only to Ugoira with uniform frame delays. .br * any other \f[I]string\f[]: Use this value as argument for \f[I]-r\f[]. .br * \f[I]null\f[] or an empty \f[I]string\f[]: Don't set an explicit frame rate. .SS ugoira.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep ZIP archives after conversion. .SS ugoira.libx264-prevent-odd .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Prevent \f[I]"width/height not divisible by 2"\f[] errors when using \f[I]libx264\f[] or \f[I]libx265\f[] encoders by applying a simple cropping filter. See this \f[I]Stack Overflow thread\f[] for more information. This option, when \f[I]libx264/5\f[] is used, automatically adds \f[I]["-vf", "crop=iw-mod(iw\\\\,2):ih-mod(ih\\\\,2)"]\f[] to the list of \f[I]ffmpeg\f[] command-line arguments to reduce an odd width/height by 1 pixel and make them even. .SS ugoira.metadata .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 When using \f[I]"mode": "archive"\f[], save Ugoira frame delay data as \f[I]animation.json\f[] within the archive file. If this is a \f[I]string\f[], use it as alternate filename for frame delay files. .SS ugoira.mtime .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Set modification times of generated ugoira aniomations. .SS ugoira.repeat-last-frame .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Allow repeating the last frame when necessary to prevent it from only being displayed for a very short amount of time. .SS ugoira.skip .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Do not convert frames if target file already exists. .SS zip.compression .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"store"\f[] .IP "Description:" 4 Compression method to use when writing the archive. Possible values are \f[I]"store"\f[], \f[I]"zip"\f[], \f[I]"bzip2"\f[], \f[I]"lzma"\f[]. .SS zip.extension .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"zip"\f[] .IP "Description:" 4 Filename extension for the created ZIP archive. .SS zip.files .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["info.json"] .IP "Description:" 4 List of extra files to be added to a ZIP archive. Note: Relative paths are relative to the current \f[I]download directory\f[]. .SS zip.keep-files .IP "Type:" 6 \f[I]bool\f[] .IP "Default:" 9 \f[I]false\f[] .IP "Description:" 4 Keep the actual files after writing them to a ZIP archive. .SS zip.mode .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 .br * \f[I]"default"\f[]: Write the central directory file header once after everything is done or an exception is raised. .br * \f[I]"safe"\f[]: Update the central directory file header each time a file is stored in a ZIP archive. This greatly reduces the chance a ZIP archive gets corrupted in case the Python interpreter gets shut down unexpectedly (power outage, SIGKILL) but is also a lot slower. .SH MISCELLANEOUS OPTIONS .SS extractor.modules .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Default:" 9 The \f[I]modules\f[] list in \f[I]extractor/__init__.py\f[] .IP "Example:" 4 ["reddit", "danbooru", "mangadex"] .IP "Description:" 4 List of internal modules to load when searching for a suitable extractor class. Useful to reduce startup time and memory usage. .SS extractor.module-sources .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] instances .IP "Example:" 4 ["~/.config/gallery-dl/modules", null] .IP "Description:" 4 List of directories to load external extractor modules from. Any file in a specified directory with a \f[I].py\f[] filename extension gets \f[I]imported\f[] and searched for potential extractors, i.e. classes with a \f[I]pattern\f[] attribute. Note: \f[I]null\f[] references internal extractors defined in \f[I]extractor/__init__.py\f[] or by \f[I]extractor.modules\f[]. .SS extractor.category-map .IP "Type:" 6 .br * \f[I]object\f[] (category -> category) .br * \f[I]string\f[] .IP "Example:" 4 .. code:: json { "danbooru": "booru", "gelbooru": "booru" } .IP "Description:" 4 A JSON object mapping category names to their replacements. Special values: .br * \f[I]"compat"\f[] .. code:: json { "coomer" : "coomerparty", "kemono" : "kemonoparty", "schalenetwork": "koharu", "naver-chzzk" : "chzzk", "naver-blog" : "naver", "naver-webtoon": "naverwebtoon", "pixiv-novel" : "pixiv", "pixiv-novel:novel" : ["pixiv", "novel"], "pixiv-novel:user" : ["pixiv", "novel-user"], "pixiv-novel:series" : ["pixiv", "novel-series"], "pixiv-novel:bookmark": ["pixiv", "novel-bookmark"] } .SS extractor.config-map .IP "Type:" 6 \f[I]object\f[] (category -> category) .IP "Default:" 9 .. code:: json { "coomerparty" : "coomer", "kemonoparty" : "kemono", "koharu" : "schalenetwork", "chzzk" : "naver-chzzk", "naver" : "naver-blog", "naverwebtoon": "naver-webtoon", "pixiv" : "pixiv-novel" } .IP "Description:" 4 Duplicate the configuration settings of extractor categories to other names. For example, a \f[I]"naver": "naver-blog"\f[] key-value pair will make all \f[I]naver\f[] config settings available for ´´naver-blog´´ extractors as well. .SS jinja.environment .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "variable_start_string": "(((", "variable_end_string" : ")))", "keep_trailing_newline": true } .IP "Description:" 4 Initialization parameters for the \f[I]jinja\f[] \f[I]Environment\f[] object. .SS jinja.policies .IP "Type:" 6 \f[I]object\f[] (name -> value) .IP "Example:" 4 .. code:: json { "urlize.rel": "nofollow noopener", "ext.i18n.trimmed": true } .IP "Description:" 4 \f[I]jinja\f[] \f[I]Policies\f[] .SS jinja.filters .IP "Type:" 6 \f[I]Module\f[] .IP "Description:" 4 A Python \f[I]Module\f[] containing custom \f[I]jinja\f[] \f[I]filters\f[] .SS jinja.tests .IP "Type:" 6 \f[I]Module\f[] .IP "Description:" 4 A Python \f[I]Module\f[] containing custom \f[I]jinja\f[] \f[I]tests\f[] .SS globals .IP "Type:" 6 \f[I]Module\f[] .IP "Description:" 4 A Python \f[I]Module\f[] whose namespace, in addition to the \f[I]GLOBALS\f[] dict in \f[I]util.py\f[], is used as \f[I]globals parameter\f[] for compiled Python expressions. .SS cache.file .IP "Type:" 6 \f[I]Path\f[] .IP "Default:" 9 .br * (\f[I]%APPDATA%\f[] or \f[I]"~"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on Windows .br * (\f[I]$XDG_CACHE_HOME\f[] or \f[I]"~/.cache"\f[]) + \f[I]"/gallery-dl/cache.sqlite3"\f[] on all other platforms .IP "Description:" 4 Path of the SQLite3 database used to cache login sessions, cookies and API tokens across gallery-dl invocations. Set this option to \f[I]null\f[] or an invalid path to disable this cache. .SS filters-environment .IP "Type:" 6 .br * \f[I]bool\f[] .br * \f[I]string\f[] .IP "Default:" 9 \f[I]true\f[] .IP "Description:" 4 Evaluate filter expressions in a special environment preventing them from raising fatal exceptions. \f[I]true\f[] or \f[I]"tryexcept"\f[]: Wrap expressions in a try/except block; Evaluate expressions raising an exception as \f[I]false\f[] \f[I]false\f[] or \f[I]"raw"\f[]: Do not wrap expressions in a special environment \f[I]"defaultdict"\f[]: Prevent exceptions when accessing undefined variables by using a \f[I]defaultdict\f[] .SS format-separator .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"/"\f[] .IP "Description:" 4 Character(s) used as argument separator in format string \f[I]format specifiers\f[]. For example, setting this option to \f[I]"#"\f[] would allow a replacement operation to be \f[I]Rold#new#\f[] instead of the default \f[I]Rold/new/\f[] .SS input-files .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["~/urls.txt", "$HOME/input"] .IP "Description:" 4 Additional input files. .SS signals-ignore .IP "Type:" 6 \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 ["SIGTTOU", "SIGTTIN", "SIGTERM"] .IP "Description:" 4 The list of signal names to ignore, i.e. set \f[I]SIG_IGN\f[] as signal handler for. .SS signals-actions .IP "Type:" 6 \f[I]object\f[] (signal -> \f[I]Action(s)\f[]) .IP "Example:" 4 .. code:: json { "SIGINT" : "flag download = stop", "SIGUSR1": [ "print Received SIGUSR1", "exec notify.sh", "exit 127" ] } .IP "Description:" 4 \f[I]Action(s)\f[] to perform when a \f[I]signal\f[] is received. .SS subconfigs .IP "Type:" 6 \f[I]list\f[] of \f[I]Path\f[] .IP "Example:" 4 ["~/cfg-twitter.json", "~/cfg-reddit.json"] .IP "Description:" 4 Additional configuration files to load. .SS warnings .IP "Type:" 6 \f[I]string\f[] .IP "Default:" 9 \f[I]"default"\f[] .IP "Description:" 4 The \f[I]Warnings Filter action\f[] used for (urllib3) warnings. .SH API TOKENS & IDS .SS extractor.deviantart.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit DeviantArt's \f[I]Applications & Keys\f[] section .br * click "Register Application" .br * scroll to "OAuth2 Redirect URI Whitelist (Required)" and enter "https://mikf.github.io/gallery-dl/oauth-redirect.html" .br * scroll to the bottom and agree to the API License Agreement. Submission Policy, and Terms of Service. .br * click "Save" .br * copy \f[I]client_id\f[] and \f[I]client_secret\f[] of your new application and put them in your configuration file as \f[I]"client-id"\f[] and \f[I]"client-secret"\f[] .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache deviantart\f[]) .br * get a new \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:deviantart\f[]) .SS extractor.flickr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Create an App\f[] in Flickr's \f[I]App Garden\f[] .br * click "APPLY FOR A NON-COMMERCIAL KEY" .br * fill out the form with a random name and description and click "SUBMIT" .br * copy \f[I]Key\f[] and \f[I]Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.mangadex.client-id & .client-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and go to your \f[I]User Settings\f[] .br * open the "API Clients" section .br * click "\f[I]+ Create\f[]" .br * choose a name .br * click "\f[I]✔️ Create\f[]" .br * wait for approval / reload the page .br * copy the value after "AUTOAPPROVED ACTIVE" in the form "personal-client-..." and put it in your configuration file as \f[I]"client-id"\f[] .br * click "\f[I]Get Secret\f[]", then "\f[I]Copy Secret\f[]", and paste it into your configuration file as \f[I]"client-secret"\f[] .SS extractor.reddit.client-id & .user-agent .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit the \f[I]apps\f[] section of your account's preferences .br * click the "are you a developer? create an app..." button .br * fill out the form: .br * choose a name .br * select "installed app" .br * set \f[I]http://localhost:6414/\f[] as "redirect uri" .br * solve the "I'm not a robot" reCAPTCHA if needed .br * click "create app" .br * copy the client id (third line, under your application's name and "installed app") and put it in your configuration file as \f[I]"client-id"\f[] .br * use "\f[I]Python::v1.0 (by /u/)\f[]" as \f[I]user-agent\f[] and replace \f[I]\f[] and \f[I]\f[] accordingly (see Reddit's \f[I]API access rules\f[]) .br * clear your \f[I]cache\f[] to delete any remaining \f[I]access-token\f[] entries. (\f[I]gallery-dl --clear-cache reddit\f[]) .br * get a \f[I]refresh-token\f[] for the new \f[I]client-id\f[] (\f[I]gallery-dl oauth:reddit\f[]) .SS extractor.smugmug.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and \f[I]Apply for an API Key\f[] .br * use a random name and description, set "Type" to "Application", "Platform" to "All", and "Use" to "Non-Commercial" .br * fill out the two checkboxes at the bottom and click "Apply" .br * copy \f[I]API Key\f[] and \f[I]API Secret\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SS extractor.tumblr.api-key & .api-secret .IP "Type:" 6 \f[I]string\f[] .IP "How To:" 4 .br * login and visit Tumblr's \f[I]Applications\f[] section .br * click "Register application" .br * fill out the form: use a random name and description, set https://example.org/ as "Application Website" and "Default callback URL" .br * solve Google's "I'm not a robot" challenge and click "Register" .br * click "Show secret key" (below "OAuth Consumer Key") .br * copy your \f[I]OAuth Consumer Key\f[] and \f[I]Secret Key\f[] and put them in your configuration file as \f[I]"api-key"\f[] and \f[I]"api-secret"\f[] .SH CUSTOM TYPES .SS Date .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]integer\f[] .IP "Example:" 4 .br * "2019-01-01T00:00:00" .br * "2019" with "%Y" as \f[I]date-format\f[] .br * 1546297200 .IP "Description:" 4 A \f[I]Date\f[] value represents a specific point in time. .br * If given as \f[I]string\f[], it is parsed according to \f[I]date-format\f[]. .br * If given as \f[I]integer\f[], it is interpreted as UTC timestamp. .SS Duration .IP "Type:" 6 .br * \f[I]float\f[] .br * \f[I]list\f[] with 2 \f[I]floats\f[] .br * \f[I]string\f[] .IP "Example:" 4 .br * 2.85 .br * [1.5, 3.0] .br * "2.85", "1.5-3.0" .IP "Description:" 4 A \f[I]Duration\f[] represents a span of time in seconds. .br * If given as a single \f[I]float\f[], it will be used as that exact value. .br * If given as a \f[I]list\f[] with 2 floating-point numbers \f[I]a\f[] & \f[I]b\f[] , it will be randomly chosen with uniform distribution such that \f[I]a <= N <= b\f[]. (see \f[I]random.uniform()\f[]) .br * If given as a \f[I]string\f[], it can either represent a single \f[I]float\f[] value (\f[I]"2.85"\f[]) or a range (\f[I]"1.5-3.0"\f[]). .SS Module .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]Path\f[] .IP "Example:" 4 .br * "gdl_utils" .br * "~/.local/share/gdl/" .br * "~/.local/share/gdl_utils.py" .IP "Description:" 4 A Python \f[I]Module\f[] This can be one of .br * the name of an \f[I]importable\f[] Python module .br * the \f[I]Path\f[] to a Python \f[I]package\f[] .br * the \f[I]Path\f[] to a .py file See \f[I]Python/Modules\f[] for details. .SS Path .IP "Type:" 6 .br * \f[I]string\f[] .br * \f[I]list\f[] of \f[I]strings\f[] .IP "Example:" 4 .br * "file.ext" .br * "~/path/to/file.ext" .br * "$HOME/path/to/file.ext" .br * ["$HOME", "path", "to", "file.ext"] .IP "Description:" 4 A \f[I]Path\f[] is a \f[I]string\f[] representing the location of a file or directory. Simple \f[I]tilde expansion\f[] and \f[I]environment variable expansion\f[] is supported. .IP "Note::" 4 In Windows environments, both backslashes \f[I]\\\f[] as well as forward slashes \f[I]/\f[] can be used as path separators. However, since backslashes are JSON's escape character, they themselves must be escaped as \f[I]\\\\\f[]. For example, a path like \f[I]C:\\path\\to\\file.ext\f[] has to be specified as .br * \f[I]"C:\\\\path\\\\to\\\\file.ext"\f[] when using backslashes .br * \f[I]"C:/path/to/file.ext"\f[] when using forward slashes in a JSON file. .SS Logging Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "format" : "{asctime} {name}: {message}", "format-date": "%H:%M:%S", "path" : "~/log.txt", "encoding" : "ascii" } .. code:: json { "level" : "debug", "format": { "debug" : "debug: {message}", "info" : "[{name}] {message}", "warning": "Warning: {message}", "error" : "ERROR: {message}" } } .IP "Description:" 4 Extended logging output configuration. .br * format .br * General format string for logging messages or an \f[I]object\f[] with format strings for each loglevel. In addition to the default \f[I]LogRecord attributes\f[], it is also possible to access the current \f[I]extractor\f[], \f[I]job\f[], \f[I]path\f[], and keywords objects and their attributes, for example \f[I]"{extractor.url}"\f[], \f[I]"{path.filename}"\f[], \f[I]"{keywords.title}"\f[] .br * Default: \f[I]"[{name}][{levelname}] {message}"\f[] .br * format-date .br * Format string for \f[I]{asctime}\f[] fields in logging messages (see \f[I]strftime() directives\f[]) .br * Default: \f[I]"%Y-%m-%d %H:%M:%S"\f[] .br * level .br * Minimum logging message level (one of \f[I]"debug"\f[], \f[I]"info"\f[], \f[I]"warning"\f[], \f[I]"error"\f[], \f[I]"exception"\f[]) .br * Default: \f[I]"info"\f[] .br * path .br * \f[I]Path\f[] to the output file .br * mode .br * Mode in which the file is opened; use \f[I]"w"\f[] to truncate or \f[I]"a"\f[] to append (see \f[I]open()\f[]) .br * Default: \f[I]"w"\f[] .br * encoding .br * File encoding .br * Default: \f[I]"utf-8"\f[] Note: path, mode, and encoding are only applied when configuring logging output to a file. .SS Postprocessor Configuration .IP "Type:" 6 \f[I]object\f[] .IP "Example:" 4 .. code:: json { "name": "mtime" } .. code:: json { "name" : "zip", "compression": "store", "extension" : "cbz", "filter" : "extension not in ('zip', 'rar')", "whitelist" : ["mangadex", "exhentai", "nhentai"] } .IP "Description:" 4 An \f[I]object\f[] containing a \f[I]"name"\f[] attribute specifying the post-processor type, as well as any of its \f[I]options\f[]. It is possible to set a \f[I]"filter"\f[] expression similar to \f[I]image-filter\f[] to only run a post-processor conditionally. It is also possible set a \f[I]"whitelist"\f[] or \f[I]"blacklist"\f[] to only enable or disable a post-processor for the specified extractor categories. The available post-processor types are \f[I]classify\f[] Categorize files by filename extension \f[I]compare\f[] Compare versions of the same file and replace/enumerate them on mismatch .br (requires \f[I]downloader.*.part\f[] = \f[I]true\f[] and \f[I]extractor.*.skip\f[] = \f[I]false\f[]) .br \f[I]directory\f[] Reevaluate \f[I]directory\f[] format strings \f[I]exec\f[] Execute external commands \f[I]hash\f[] Compute file hash digests \f[I]metadata\f[] Write metadata to separate files \f[I]mtime\f[] Set file modification time according to its metadata \f[I]python\f[] Call Python functions \f[I]rename\f[] Rename previously downloaded files \f[I]ugoira\f[] Convert Pixiv Ugoira to WebM using \f[I]ffmpeg\f[] \f[I]zip\f[] Store files in a ZIP archive .SS Action .IP "Type:" 6 \f[I]string\f[] .IP "Example:" 4 .br * "exit" .br * "print Hello World" .br * "raise AbortExtraction an error occured" .br * "flag file = terminate" .IP "Description:" 4 An \f[I]Action\f[] is parsed as Action Type followed by (optional) arguments. It is possible to specify more than one \f[I]action\f[] by providing them as a \f[I]list\f[]: \f[I]["", "", …]\f[] Supported Action Types: \f[I]status\f[]: Modify job exit status. .br Expected syntax is \f[I] \f[] (e.g. \f[I]= 100\f[]). .br Supported operators are \f[I]=\f[] (assignment), \f[I]&\f[] (bitwise AND), \f[I]|\f[] (bitwise OR), \f[I]^\f[] (bitwise XOR). \f[I]level\f[]: Modify severity level of the current logging message. .br Can be one of \f[I]debug\f[], \f[I]info\f[], \f[I]warning\f[], \f[I]error\f[] or an integer value. .br \f[I]print\f[]: Write argument to stdout. \f[I]exec\f[]: Run a shell command. \f[I]abort\f[]: Stop the current extractor run. \f[I]terminate\f[]: Stop the current extractor run, including parent extractors. \f[I]restart\f[]: Restart the current extractor run. \f[I]raise\f[]: Raise an exception. This can be an exception defined in \f[I]exception.py\f[] or a \f[I]built-in exception\f[] (e.g. \f[I]ZeroDivisionError\f[]) \f[I]flag\f[]: Set a \f[I]flag\f[]. Expected syntax is \f[I][ = ]\f[] (e.g. \f[I]post = stop\f[]) .br \f[I]\f[] can be one of \f[I]file\f[], \f[I]post\f[], \f[I]child\f[], \f[I]download\f[] .br \f[I]\f[] can be one of \f[I]stop\f[], \f[I]abort\f[], \f[I]terminate\f[], \f[I]restart\f[] (default \f[I]stop\f[]) .br \f[I]wait\f[]: Sleep for a given \f[I]Duration\f[] or .br wait until Enter is pressed when no argument was given. .br \f[I]exit\f[]: Exit the program with the given argument as exit status. .SH BUGS https://github.com/mikf/gallery-dl/issues .SH AUTHORS Mike Fährmann .br and https://github.com/mikf/gallery-dl/graphs/contributors .SH "SEE ALSO" .BR gallery-dl (1) ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.0389762 gallery_dl-1.30.2/docs/0000755000175000017500000000000015041463232013337 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/docs/gallery-dl-example.conf0000644000175000017500000003460715040344700017702 0ustar00mikemike{ "extractor": { "base-directory": "~/gallery-dl/", "#": "set global archive file for all extractors", "archive": "~/gallery-dl/archive.sqlite3", "archive-pragma": ["journal_mode=WAL", "synchronous=NORMAL"], "#": "add two custom keywords into the metadata dictionary", "#": "these can be used to further refine your output directories or filenames", "keywords": {"bkey": "", "ckey": ""}, "#": "make sure that custom keywords are empty, i.e. they don't appear unless specified by the user", "keywords-default": "", "#": "replace invalid path characters with unicode alternatives", "path-restrict": { "\\": "⧹", "/" : "⧸", "|" : "│", ":" : "꞉", "*" : "∗", "?" : "?", "\"": "″", "<" : "﹤", ">" : "﹥" }, "#": "write tags for several *booru sites", "postprocessors": [ { "name": "metadata", "mode": "tags", "whitelist": ["danbooru", "moebooru", "sankaku"] } ], "pixiv": { "#": "override global archive path for pixiv", "archive": "~/gallery-dl/archive-pixiv.sqlite3", "#": "set custom directory and filename format strings for all pixiv downloads", "filename": "{id}{num}.{extension}", "directory": ["Pixiv", "Works", "{user[id]}"], "refresh-token": "aBcDeFgHiJkLmNoPqRsTuVwXyZ01234567890-FedC9", "#": "transform ugoira into lossless MKVs", "ugoira": true, "postprocessors": ["ugoira-copy"], "#": "use special settings for favorites and bookmarks", "favorite": { "directory": ["Pixiv", "Favorites", "{user[id]}"] }, "bookmark": { "directory": ["Pixiv", "My Bookmarks"], "refresh-token": "01234567890aBcDeFgHiJkLmNoPqRsTuVwXyZ-ZyxW1" } }, "danbooru": { "ugoira": true, "postprocessors": ["ugoira-webm"] }, "exhentai": { "#": "use cookies instead of logging in with username and password", "cookies": { "ipb_member_id": "12345", "ipb_pass_hash": "1234567890abcdef", "igneous" : "123456789", "hath_perks" : "m1.m2.m3.a-123456789a", "sk" : "n4m34tv3574m2c4e22c35zgeehiw", "sl" : "dm_2" }, "#": "wait 2 to 4.8 seconds between HTTP requests", "sleep-request": [2.0, 4.8], "filename": "{num:>04}_{name}.{extension}", "directory": ["{category!c}", "{title}"] }, "sankaku": { "#": "authentication with cookies is not possible for sankaku", "username": "user", "password": "#secret#" }, "furaffinity": { "#": "authentication with username and password is not possible due to CAPTCHA", "cookies": { "a": "01234567-89ab-cdef-fedc-ba9876543210", "b": "fedcba98-7654-3210-0123-456789abcdef" }, "descriptions": "html", "postprocessors": ["content"] }, "deviantart": { "#": "download 'gallery' and 'scraps' images for user profile URLs", "include": "gallery,scraps", "#": "use custom API credentials to avoid 429 errors", "client-id": "98765", "client-secret": "0123456789abcdef0123456789abcdef", "refresh-token": "0123456789abcdef0123456789abcdef01234567", "#": "put description texts into a separate directory", "metadata": true, "postprocessors": [ { "name": "metadata", "mode": "custom", "directory" : "Descriptions", "content-format" : "{description}\n", "extension-format": "descr.txt" } ] }, "kemonoparty": { "postprocessors": [ { "name": "metadata", "event": "post", "filename": "{id} {title}.txt", "#": "write text content and external URLs", "mode": "custom", "format": "{content}\n{embed[url]:?/\n/}", "#": "onlx write file if there is an external link present", "filter": "embed.get('url') or re.search(r'(?i)(gigafile|xgf|1drv|mediafire|mega|google|drive)', content)" } ] }, "flickr": { "access-token": "1234567890-abcdef", "access-token-secret": "1234567890abcdef", "size-max": 1920 }, "mangadex": { "#": "only download safe/suggestive chapters translated to English", "lang": "en", "ratings": ["safe", "suggestive"], "#": "put chapters into '.cbz' archives", "postprocessors": ["cbz"] }, "reddit": { "#": "only spawn child extractors for links to specific sites", "whitelist": ["imgur", "redgifs"], "#": "put files from child extractors into the reddit directory", "parent-directory": true, "#": "transfer metadata to any child extractor as '_reddit'", "parent-metadata": "_reddit" }, "imgur": { "#": "general imgur settings", "filename": "{id}.{extension}" }, "reddit>imgur": { "#": "special settings for imgur URLs found in reddit posts", "directory": [], "filename": "{_reddit[id]} {_reddit[title]} {id}.{extension}" }, "tumblr": { "posts" : "all", "external": false, "reblogs" : false, "inline" : true, "#": "use special settings when downloading liked posts", "likes": { "posts" : "video,photo,link", "external": true, "reblogs" : true } }, "twitter": { "#": "write text content for *all* tweets", "postprocessors": ["content"], "text-tweets": true }, "ytdl": { "#": "enable 'ytdl' extractor", "#": "i.e. invoke ytdl on all otherwise unsupported input URLs", "enabled": true, "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp", "#": "load ytdl options from config file", "config-file": "~/yt-dlp.conf" }, "mastodon": { "#": "add 'tabletop.social' as recognized mastodon instance", "#": "(run 'gallery-dl oauth:mastodon:tabletop.social to get an access token')", "tabletop.social": { "root": "https://tabletop.social", "access-token": "513a36c6..." }, "#": "set filename format strings for all 'mastodon' instances", "directory": ["mastodon", "{instance}", "{account[username]!l}"], "filename" : "{id}_{media[id]}.{extension}" }, "foolslide": { "#": "add two more foolslide instances", "otscans" : {"root": "https://otscans.com/foolslide"}, "helvetica": {"root": "https://helveticascans.com/r" } }, "foolfuuka": { "#": "add two other foolfuuka 4chan archives", "fireden-onion": {"root": "http://ydt6jy2ng3s3xg2e.onion"}, "scalearchive" : {"root": "https://archive.scaled.team" } }, "gelbooru_v01": { "#": "add a custom gelbooru_v01 instance", "#": "this is just an example, this specific instance is already included!", "allgirlbooru": {"root": "https://allgirl.booru.org"}, "#": "the following options are used for all gelbooru_v01 instances", "tag": { "directory": { "locals().get('bkey')": ["Booru", "AllGirlBooru", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Booru", "AllGirlBooru", "Tags", "_Unsorted", "{search_tags}"] } }, "post": { "directory": ["Booru", "AllGirlBooru", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-gelbooru_v01_instances.db", "filename": "{tags}_{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "gelbooru_v02": { "#": "add a custom gelbooru_v02 instance", "#": "this is just an example, this specific instance is already included!", "tbib": { "root": "https://tbib.org", "#": "some sites have different domains for API access", "#": "use the 'api_root' option in addition to the 'root' setting here" } }, "tbib": { "#": "the following options are only used for TBIB", "#": "gelbooru_v02 has four subcategories at the moment, use custom directory settings for all of these", "tag": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Tags", "{bkey}", "{ckey}", "{search_tags}"], "" : ["Other Boorus", "TBIB", "Tags", "_Unsorted", "{search_tags}"] } }, "pool": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Pools", "{bkey}", "{ckey}", "{pool}"], "" : ["Other Boorus", "TBIB", "Pools", "_Unsorted", "{pool}"] } }, "favorite": { "directory": { "locals().get('bkey')": ["Other Boorus", "TBIB", "Favorites", "{bkey}", "{ckey}", "{favorite_id}"], "" : ["Other Boorus", "TBIB", "Favorites", "_Unsorted", "{favorite_id}"] } }, "post": { "directory": ["Other Boorus", "TBIB", "Posts"] }, "archive": "~/gallery-dl/custom-archive-file-for-TBIB.db", "filename": "{id}_{md5}.{extension}", "sleep-request": [0, 1.2] }, "urlshortener": { "tinyurl": {"root": "https://tinyurl.com"} } }, "downloader": { "#": "restrict download speed to 1 MB/s", "rate": "1M", "#": "show download progress indicator after 2 seconds", "progress": 2.0, "#": "retry failed downloads up to 3 times", "retries": 3, "#": "consider a download 'failed' after 8 seconds of inactivity", "timeout": 8.0, "#": "write '.part' files into a special directory", "part-directory": "/tmp/.download/", "#": "do not update file modification times", "mtime": false, "ytdl": { "#": "use yt-dlp instead of youtube-dl", "module": "yt_dlp" } }, "output": { "log": { "level": "info", "#": "use different ANSI colors for each log level", "format": { "debug" : "\u001b[0;37m{name}: {message}\u001b[0m", "info" : "\u001b[1;37m{name}: {message}\u001b[0m", "warning": "\u001b[1;33m{name}: {message}\u001b[0m", "error" : "\u001b[1;31m{name}: {message}\u001b[0m" } }, "#": "shorten filenames to fit into one terminal line", "#": "while also considering wider East-Asian characters", "shorten": "eaw", "#": "enable ANSI escape sequences on Windows", "ansi": true, "#": "write logging messages to a separate file", "logfile": { "path": "~/gallery-dl/log.txt", "mode": "w", "level": "debug" }, "#": "write unrecognized URLs to a separate file", "unsupportedfile": { "path": "~/gallery-dl/unsupported.txt", "mode": "a", "format": "{asctime} {message}", "format-date": "%Y-%m-%d-%H-%M-%S" } }, "postprocessor": { "#": "write 'content' metadata into separate files", "content": { "name" : "metadata", "#": "write data for every post instead of each individual file", "event": "post", "filename": "{post_id|tweet_id|id}.txt", "#": "write only the values for 'content' or 'description'", "mode" : "custom", "format": "{content|description}\n" }, "#": "put files into a '.cbz' archive", "cbz": { "name": "zip", "extension": "cbz" }, "#": "various ugoira post processor configurations to create different file formats", "ugoira-webm": { "name": "ugoira", "extension": "webm", "ffmpeg-args": ["-c:v", "libvpx-vp9", "-an", "-b:v", "0", "-crf", "30"], "ffmpeg-twopass": true, "ffmpeg-demuxer": "image2" }, "ugoira-mp4": { "name": "ugoira", "extension": "mp4", "ffmpeg-args": ["-c:v", "libx264", "-an", "-b:v", "4M", "-preset", "veryslow"], "ffmpeg-twopass": true, "libx264-prevent-odd": true }, "ugoira-gif": { "name": "ugoira", "extension": "gif", "ffmpeg-args": ["-filter_complex", "[0:v] split [a][b];[a] palettegen [p];[b][p] paletteuse"] }, "ugoira-copy": { "name": "ugoira", "extension": "mkv", "ffmpeg-args": ["-c", "copy"], "libx264-prevent-odd": false, "repeat-last-frame": false } }, "#": "use a custom cache file location", "cache": { "file": "~/gallery-dl/cache.sqlite3" } } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753463920.0 gallery_dl-1.30.2/docs/gallery-dl.conf0000644000175000017500000006530515040736160016255 0ustar00mikemike{ "#": "gallery-dl default configuration file", "#": "full documentation at", "#": "https://gdl-org.github.io/docs/configuration.html", "extractor": { "#": "===============================================================", "#": "==== General Extractor Options ==========================", "#": "(these can be set as site-specific extractor options as well) ", "base-directory": "./gallery-dl/", "postprocessors": null, "skip" : true, "skip-filter" : null, "user-agent" : "auto", "referer" : true, "headers" : {}, "ciphers" : null, "tls12" : true, "browser" : null, "proxy" : null, "proxy-env" : true, "source-address": null, "retries" : 4, "retry-codes" : [], "timeout" : 30.0, "verify" : true, "truststore" : false, "download" : true, "fallback" : true, "archive" : null, "archive-format": null, "archive-prefix": null, "archive-pragma": [], "archive-event" : ["file"], "archive-mode" : "file", "archive-table" : null, "cookies": null, "cookies-select": null, "cookies-update": true, "image-filter" : null, "image-range" : null, "image-unique" : false, "chapter-filter": null, "chapter-range" : null, "chapter-unique": false, "keywords" : {}, "keywords-eval" : false, "keywords-default" : null, "parent-directory": false, "parent-metadata" : false, "parent-skip" : false, "path-restrict": "auto", "path-replace" : "_", "path-remove" : "\\u0000-\\u001f\\u007f", "path-strip" : "auto", "path-extended": true, "metadata-extractor": null, "metadata-http" : null, "metadata-parent" : null, "metadata-path" : null, "metadata-url" : null, "metadata-version" : null, "sleep" : 0, "sleep-request" : 0, "sleep-extractor": 0, "sleep-429" : 60.0, "actions": [], "input" : null, "netrc" : false, "extension-map": { "jpeg": "jpg", "jpe" : "jpg", "jfif": "jpg", "jif" : "jpg", "jfi" : "jpg" }, "category-map": {}, "config-map": { "coomerparty" : "coomer", "kemonoparty" : "kemono", "koharu" : "schalenetwork", "chzzk" : "naver-chzzk", "naver" : "naver-blog", "naverwebtoon": "naver-webtoon", "pixiv" : "pixiv-novel" }, "#": "===============================================================", "#": "==== Site-specific Extractor Options ====================", "ao3": { "username": "", "password": "", "sleep-request": "0.5-1.5", "formats": ["pdf"] }, "arcalive": { "sleep-request": "0.5-1.5", "emoticons": false, "gifs" : true }, "artstation": { "external" : false, "max-posts": null, "mviews" : true, "previews" : false, "videos" : true, "search": { "pro-first": true } }, "aryion": { "username": "", "password": "", "recursive": true }, "batoto": { "domain": "auto" }, "bbc": { "width": 1920 }, "behance": { "sleep-request": "2.0-4.0", "modules": ["image", "video", "mediacollection", "embed"] }, "bilibili": { "sleep-request": "3.0-6.0" }, "bluesky": { "username": "", "password": "", "include" : ["media"], "metadata": false, "quoted" : false, "reposts" : false, "videos" : true, "likes": { "depth" : 0, "endpoint": "listRecords" }, "post": { "depth": 0 } }, "boosty": { "allowed" : true, "bought" : false, "metadata": false, "videos" : true }, "bunkr": { "endpoint": "/api/_001_v2", "tlds": false }, "cien": { "sleep-request": "1.0-2.0", "files": ["image", "video", "download", "gallery"] }, "civitai": { "api-key": null, "sleep-request": "0.5-1.5", "api" : "trpc", "files" : ["image"], "include" : ["user-images", "user-videos"], "metadata": false, "nsfw" : true, "quality" : "original=true", "quality-videos": "quality=100" }, "coomer": { "username": "", "password": "", "announcements": false, "comments" : false, "dms" : false, "duplicates" : false, "favorites" : "artist", "files" : ["attachments", "file", "inline"], "max-posts" : null, "metadata" : false, "revisions" : false, "order-revisions": "desc" }, "cyberdrop": { "domain": null }, "dankefuerslesen": { "zip": false }, "deviantart": { "client-id" : null, "client-secret": null, "refresh-token": null, "auto-watch" : false, "auto-unwatch" : false, "comments" : false, "comments-avatars": false, "extra" : false, "flat" : true, "folders" : false, "group" : true, "include" : "gallery", "intermediary" : true, "journals" : "html", "jwt" : false, "mature" : true, "metadata" : false, "original" : true, "pagination" : "api", "previews" : false, "public" : true, "quality" : 100, "wait-min" : 0, "avatar": { "formats": null }, "folder": { "subfolders": true } }, "discord": { "embeds" : ["image", "gifv", "video"], "threads": true, "token" : "" }, "dynastyscans": { "anthology": { "metadata": false } }, "exhentai": { "username": "", "password": "", "cookies" : null, "sleep-request": "3.0-6.0", "domain" : "auto", "fav" : null, "gp" : "resized", "metadata": false, "original": true, "source" : null, "tags" : false, "limits" : null, "limits-action" : "stop", "fallback-retries": 2 }, "facebook": { "cookies": null, "author-followups": false, "include": "photos", "videos" : true }, "fanbox": { "cookies" : null, "comments": false, "embeds" : true, "fee-max" : null, "metadata": false }, "flickr": { "access-token" : null, "access-token-secret": null, "sleep-request" : "1.0-2.0", "contexts": false, "exif" : false, "info" : false, "metadata": false, "profile" : false, "size-max": null, "videos" : true }, "furaffinity": { "cookies" : null, "sleep-request": "1.0", "descriptions": "text", "external" : false, "include" : ["gallery"], "layout" : "auto" }, "gelbooru": { "api-key": null, "user-id": null, "favorite": { "order-posts": "desc" } }, "generic": { "enabled": false }, "girlswithmuscle": { "username": "", "password": "" }, "gofile": { "api-token": null, "website-token": null, "recursive": false }, "hentaifoundry": { "include": ["pictures"] }, "hitomi": { "format": "webp" }, "idolcomplex": { "username": "", "password": "", "referer" : false, "sleep-request": "3.0-6.0" }, "imagechest": { "access-token": null }, "imagefap": { "sleep-request": "2.0-4.0" }, "imgbb": { "username": "", "password": "" }, "imgur": { "client-id": null, "mp4": true }, "inkbunny": { "username": "", "password": "", "orderby": "create_datetime" }, "instagram": { "cookies": null, "sleep-request": "6.0-12.0", "api" : "rest", "cursor" : true, "include" : "posts", "max-posts" : null, "metadata" : false, "order-files": "asc", "order-posts": "asc", "previews" : false, "videos" : true, "stories": { "split": false } }, "itaku": { "sleep-request": "0.5-1.5", "include": "gallery", "videos" : true }, "iwara": { "username": "", "password": "", "include": ["user-images", "user-images"] }, "kemono": { "username": "", "password": "", "announcements": false, "archives" : false, "comments" : false, "dms" : false, "duplicates" : false, "endpoint" : "posts", "favorites" : "artist", "files" : ["attachments", "file", "inline"], "max-posts" : null, "metadata" : true, "revisions" : false, "order-revisions": "desc" }, "khinsider": { "covers": false, "format": "mp3" }, "luscious": { "gif": false }, "madokami": { "username": "", "password": "" }, "mangadex": { "client-id" : "", "client-secret": "", "username": "", "password": "", "api-server": "https://api.mangadex.org", "api-parameters": null, "lang": null, "ratings": ["safe", "suggestive", "erotica", "pornographic"] }, "mangoxo": { "username": "", "password": "" }, "naver-blog": { "videos": true }, "naver-chzzk": { "offset": 0 }, "newgrounds": { "username": "", "password": "", "sleep-request": "0.5-1.5", "flash" : true, "format" : "original", "include": ["art"] }, "nsfwalbum": { "referer": false }, "oauth": { "browser": true, "cache" : true, "host" : "localhost", "port" : 6414 }, "paheal": { "metadata": false }, "patreon": { "cookies": null, "cursor" : true, "files" : ["images", "image_large", "attachments", "postfile", "content"], "format-images": "download_url", "user": { "date-max" : 0 } }, "pexels": { "sleep-request": "1.0-2.0" }, "pillowfort": { "username": "", "password": "", "external": false, "inline" : true, "reblogs" : false }, "pinterest": { "domain" : "auto", "sections": true, "stories" : true, "videos" : true }, "pixeldrain": { "api-key" : null, "recursive": false }, "pixiv": { "refresh-token": null, "cookies" : null, "captions" : false, "comments" : false, "include" : ["artworks"], "max-posts": null, "metadata" : false, "metadata-bookmark": false, "sanity" : true, "tags" : "japanese", "ugoira" : true }, "pixiv-novel": { "refresh-token": null, "comments" : false, "max-posts": null, "metadata" : false, "metadata-bookmark": false, "tags" : "japanese", "covers" : false, "embeds" : false, "full-series": false }, "plurk": { "sleep-request": "0.5-1.5", "comments": false }, "poipiku": { "sleep-request": "0.5-1.5" }, "pornpics": { "sleep-request": "0.5-1.5" }, "readcomiconline": { "sleep-request": "3.0-6.0", "captcha": "stop", "quality": "auto" }, "reddit": { "client-id" : null, "user-agent" : null, "refresh-token": null, "comments" : 0, "morecomments": false, "embeds" : true, "date-min" : 0, "date-max" : 253402210800, "date-format" : "%Y-%m-%dT%H:%M:%S", "id-min" : null, "id-max" : null, "previews" : true, "recursion" : 0, "selftext" : null, "videos" : true }, "redgifs": { "format": ["hd", "sd", "gif"] }, "rule34xyz": { "username": "", "password": "", "format": ["10", "40", "41", "2"] }, "sankaku": { "username": "", "password": "", "refresh" : false, "tags" : false }, "sankakucomplex": { "embeds": false, "videos": true }, "schalenetwork": { "username": "", "password": "", "sleep-request": "0.5-1.5", "cbz" : true, "format": ["0", "1600", "1280", "980", "780"], "tags" : false }, "scrolller": { "username": "", "password": "", "sleep-request": "0.5-1.5" }, "sexcom": { "gifs": true }, "skeb": { "article" : false, "sent-requests": false, "thumbnails" : false, "search": { "filters": null } }, "smugmug": { "access-token" : null, "access-token-secret": null, "videos": true }, "soundgasm": { "sleep-request": "0.5-1.5" }, "steamgriddb": { "animated" : true, "epilepsy" : true, "humor" : true, "dimensions": "all", "file-types": "all", "languages" : "all,", "nsfw" : true, "sort" : "score_desc", "static" : true, "styles" : "all", "untagged" : true, "download-fake-png": true }, "seiga": { "username": "", "password": "", "cookies" : null }, "subscribestar": { "username": "", "password": "" }, "tapas": { "username": "", "password": "" }, "tenor": { "format": ["gif", "mp4", "webm", "webp"] }, "tiktok": { "audio" : true, "videos": true, "user": { "avatar": true, "module": null, "tiktok-range": "" } }, "tsumino": { "username": "", "password": "" }, "tumblr": { "access-token" : null, "access-token-secret": null, "avatar" : false, "date-min" : 0, "date-max" : null, "external" : false, "inline" : true, "offset" : 0, "original" : true, "pagination": "offset", "posts" : "all", "ratelimit" : "abort", "reblogs" : true, "fallback-delay" : 120.0, "fallback-retries": 2 }, "tumblrgallery": { "referer": false }, "twitter": { "username" : "", "username-alt": "", "password" : "", "cookies" : null, "ads" : false, "cards" : false, "cards-blacklist": [], "csrf" : "cookies", "cursor" : true, "expand" : false, "include" : ["timeline"], "locked" : "abort", "logout" : true, "pinned" : false, "quoted" : false, "ratelimit" : "wait", "relogin" : true, "replies" : true, "retweets" : false, "size" : ["orig", "4096x4096", "large", "medium", "small"], "text-tweets" : false, "tweet-endpoint": "auto", "transform" : true, "twitpic" : false, "unavailable" : false, "unique" : true, "users" : "user", "videos" : true, "timeline": { "strategy": "auto" }, "tweet": { "conversations": false } }, "unsplash": { "format": "raw" }, "urlgalleries": { "sleep-request": "0.5-1.5" }, "vipergirls": { "username": "", "password": "", "sleep-request": "0.5", "domain" : "viper.click", "like" : false }, "vk": { "sleep-request": "0.5-1.5", "offset": 0 }, "vsco": { "include": ["gallery"], "videos" : true }, "wallhaven": { "api-key" : null, "sleep-request": "1.4", "include" : ["uploads"], "metadata": false }, "weasyl": { "api-key" : null, "metadata": false }, "webtoons": { "sleep-request": "0.5-1.5", "quality" : "original", "banners" : false, "thumbnails": false }, "weebcentral": { "sleep-request": "0.5-1.5" }, "weibo": { "sleep-request": "1.0-2.0", "gifs" : true, "include" : ["feed"], "livephoto": true, "movies" : false, "retweets" : false, "videos" : true }, "xfolio": { "sleep-request": "0.5-1.5" }, "ytdl": { "cmdline-args": null, "config-file" : null, "deprecations": false, "enabled" : false, "format" : null, "generic" : true, "generic-category": true, "logging" : true, "module" : null, "raw-options" : null }, "zerochan": { "username": "", "password": "", "sleep-request": "0.5-1.5", "metadata" : false, "pagination": "api", "redirects" : false }, "#": "===============================================================", "#": "==== Base-Extractor and Instance Options ================", "blogger": { "api-key": null, "videos" : true }, "Danbooru": { "sleep-request": "0.5-1.5", "external" : false, "metadata" : false, "threshold": "auto", "ugoira" : false, "favgroup": { "order-posts": "pool" }, "pool": { "order-posts": "pool" } }, "danbooru": { "username": "", "password": "" }, "atfbooru": { "username": "", "password": "" }, "aibooru": { "username": "", "password": "" }, "booruvar": { "username": "", "password": "" }, "E621": { "sleep-request": "0.5-1.5", "metadata" : false, "threshold": "auto" }, "e621": { "username": "", "password": "" }, "e926": { "username": "", "password": "" }, "e6ai": { "username": "", "password": "" }, "foolfuuka": { "sleep-request": "0.5-1.5" }, "archivedmoe": { "referer": false }, "mastodon": { "access-token": null, "cards" : false, "reblogs" : false, "replies" : true, "text-posts" : false }, "misskey": { "access-token": null, "include" : ["notes"], "renotes" : false, "replies" : true }, "Nijie": { "sleep-request": "2.0-4.0", "include" : ["illustration", "doujin"] }, "nijie": { "username": "", "password": "" }, "horne": { "username": "", "password": "" }, "nitter": { "quoted" : false, "retweets": false, "videos" : true }, "philomena": { "api-key": null, "sleep-request": "0.5-1.5", "svg" : true, "filter": 2 }, "derpibooru": { "filter": 56027 }, "ponybooru": { "filter": 3 }, "twibooru": { "sleep-request": "6.0-6.1" }, "postmill": { "save-link-post-body": false }, "reactor": { "sleep-request": "3.0-6.0", "gif": false }, "wikimedia": { "sleep-request": "1.0-2.0", "limit": 50, "subcategories": true }, "booru": { "tags" : false, "notes": false, "url" : "file_url" } }, "#": "===================================================================", "#": "==== Downloader Options =====================================", "downloader": { "filesize-min" : null, "filesize-max" : null, "mtime" : true, "part" : true, "part-directory": null, "progress" : 3.0, "proxy" : null, "rate" : null, "retries" : 4, "timeout" : 30.0, "verify" : true, "http": { "adjust-extensions": true, "chunk-size" : 32768, "consume-content" : false, "enabled" : true, "headers" : null, "retry-codes" : [], "sleep-429" : 60.0, "validate" : true, "validate-html" : true }, "ytdl": { "cmdline-args" : null, "config-file" : null, "deprecations" : false, "enabled" : true, "format" : null, "forward-cookies": true, "logging" : true, "module" : null, "outtmpl" : null, "raw-options" : null } }, "#": "===================================================================", "#": "==== Output Options =========================================", "output": { "ansi" : true, "fallback" : true, "mode" : "auto", "private" : false, "progress" : true, "shorten" : true, "skip" : true, "stdin" : null, "stdout" : null, "stderr" : null, "log" : "[{name}][{levelname}] {message}", "logfile" : null, "errorfile": null, "unsupportedfile": null, "colors" : { "success": "1;32", "skip" : "2", "debug" : "0;37", "info" : "1;37", "warning": "1;33", "error" : "1;31" } } } ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.0477083 gallery_dl-1.30.2/gallery_dl/0000755000175000017500000000000015041463232014525 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753637879.0 gallery_dl-1.30.2/gallery_dl/__init__.py0000644000175000017500000005035315041461767016657 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys import logging from . import version, config, option, output, extractor, job, util, exception __author__ = "Mike Fährmann" __copyright__ = "Copyright 2014-2025 Mike Fährmann" __license__ = "GPLv2" __maintainer__ = "Mike Fährmann" __email__ = "mike_faehrmann@web.de" __version__ = version.__version__ def main(): try: parser = option.build_parser() args = parser.parse_args() log = output.initialize_logging(args.loglevel) # configuration if args.config_load: config.load() if args.configs_json: config.load(args.configs_json, strict=True) if args.configs_yaml: import yaml config.load(args.configs_yaml, strict=True, loads=yaml.safe_load) if args.configs_toml: try: import tomllib as toml except ImportError: import toml config.load(args.configs_toml, strict=True, loads=toml.loads) if not args.colors: output.ANSI = False config.set((), "colors", False) if util.WINDOWS: config.set(("output",), "ansi", False) if args.filename: filename = args.filename if filename == "/O": filename = "{filename}.{extension}" elif filename.startswith("\\f"): filename = "\f" + filename[2:] config.set((), "filename", filename) if args.directory is not None: config.set((), "base-directory", args.directory) config.set((), "directory", ()) if args.postprocessors: config.set((), "postprocessors", args.postprocessors) if args.abort: config.set((), "skip", "abort:" + str(args.abort)) if args.terminate: config.set((), "skip", "terminate:" + str(args.terminate)) if args.cookies_from_browser: browser, _, profile = args.cookies_from_browser.partition(":") browser, _, keyring = browser.partition("+") browser, _, domain = browser.partition("/") if profile and profile[0] == ":": container = profile[1:] profile = None else: profile, _, container = profile.partition("::") config.set((), "cookies", ( browser, profile, keyring, container, domain)) if args.options_pp: config.set((), "postprocessor-options", args.options_pp) for opts in args.options: config.set(*opts) output.configure_standard_streams() # signals if signals := config.get((), "signals-ignore"): import signal if isinstance(signals, str): signals = signals.split(",") for signal_name in signals: signal_num = getattr(signal, signal_name, None) if signal_num is None: log.warning("signal '%s' is not defined", signal_name) else: signal.signal(signal_num, signal.SIG_IGN) if signals := config.get((), "signals-actions"): from . import actions actions.parse_signals(signals) # enable ANSI escape sequences on Windows if util.WINDOWS and config.get(("output",), "ansi", output.COLORS): from ctypes import windll, wintypes, byref kernel32 = windll.kernel32 mode = wintypes.DWORD() for handle_id in (-11, -12): # stdout and stderr handle = kernel32.GetStdHandle(handle_id) kernel32.GetConsoleMode(handle, byref(mode)) if not mode.value & 0x4: mode.value |= 0x4 kernel32.SetConsoleMode(handle, mode) output.ANSI = True # filter environment filterenv = config.get((), "filters-environment", True) if filterenv is True: pass elif not filterenv: util.compile_expression = util.compile_expression_raw elif isinstance(filterenv, str): if filterenv == "raw": util.compile_expression = util.compile_expression_raw elif filterenv.startswith("default"): util.compile_expression = util.compile_expression_defaultdict # format string separator if separator := config.get((), "format-separator"): from . import formatter formatter._SEPARATOR = separator # eval globals if path := config.get((), "globals"): util.GLOBALS.update(util.import_file(path).__dict__) # loglevels output.configure_logging(args.loglevel) if args.loglevel >= logging.WARNING: config.set(("output",), "mode", "null") config.set(("downloader",), "progress", None) elif args.loglevel <= logging.DEBUG: import platform import requests if util.EXECUTABLE: extra = f" - Executable ({version.__variant__})" elif git_head := util.git_head(): extra = " - Git HEAD: " + git_head else: extra = "" log.debug("Version %s%s", __version__, extra) log.debug("Python %s - %s", platform.python_version(), platform.platform()) try: log.debug("requests %s - urllib3 %s", requests.__version__, requests.packages.urllib3.__version__) except AttributeError: pass log.debug("Configuration Files %s", config._files) if args.clear_cache: from . import cache log = logging.getLogger("cache") cnt = cache.clear(args.clear_cache) if cnt is None: log.error("Database file not available") return 1 log.info("Deleted %d entr%s from '%s'", cnt, "y" if cnt == 1 else "ies", cache._path()) return 0 if args.config: if args.config == "init": return config.initialize() elif args.config == "status": return config.status() else: return config.open_extern() if args.print_traffic: import requests requests.packages.urllib3.connection.HTTPConnection.debuglevel = 1 if args.update: from . import update extr = update.UpdateExtractor.from_url("update:" + args.update) ujob = update.UpdateJob(extr) return ujob.run() # category renaming config.remap_categories() # extractor modules modules = config.get(("extractor",), "modules") if modules is not None: if isinstance(modules, str): modules = modules.split(",") extractor.modules = modules # external modules if args.extractor_sources: sources = args.extractor_sources sources.append(None) else: sources = config.get(("extractor",), "module-sources") if sources: import os modules = [] for source in sources: if source: path = util.expand_path(source) try: files = os.listdir(path) modules.append(extractor._modules_path(path, files)) except Exception as exc: log.warning("Unable to load modules from %s (%s: %s)", path, exc.__class__.__name__, exc) else: modules.append(extractor._modules_internal()) if len(modules) > 1: import itertools extractor._module_iter = itertools.chain(*modules) elif not modules: extractor._module_iter = () else: extractor._module_iter = iter(modules[0]) if args.list_modules: extractor.modules.append("") sys.stdout.write("\n".join(extractor.modules)) elif args.list_extractors is not None: write = sys.stdout.write fmt = ("{}{}\nCategory: {} - Subcategory: {}" "\nExample : {}\n\n").format extractors = extractor.extractors() if args.list_extractors: fltr = util.build_extractor_filter( args.list_extractors, negate=False) extractors = filter(fltr, extractors) for extr in extractors: write(fmt( extr.__name__, "\n" + extr.__doc__ if extr.__doc__ else "", extr.category, extr.subcategory, extr.example, )) else: if input_files := config.get((), "input-files"): for input_file in input_files: if isinstance(input_file, str): input_file = (input_file, None) args.input_files.append(input_file) if not args.urls and not args.input_files: if args.cookies_from_browser or config.interpolate( ("extractor",), "cookies"): args.urls.append("noop") else: parser.error( "The following arguments are required: URL\nUse " "'gallery-dl --help' to get a list of all options.") if args.list_urls: jobtype = job.UrlJob jobtype.maxdepth = args.list_urls if config.get(("output",), "fallback", True): jobtype.handle_url = jobtype.handle_url_fallback elif args.dump_json: jobtype = job.DataJob jobtype.resolve = args.dump_json - 1 else: jobtype = args.jobtype or job.DownloadJob input_manager = InputManager() input_manager.log = input_log = logging.getLogger("inputfile") # unsupported file logging handler if handler := output.setup_logging_handler( "unsupportedfile", fmt="{message}"): ulog = job.Job.ulog = logging.getLogger("unsupported") ulog.addHandler(handler) ulog.propagate = False # error file logging handler if handler := output.setup_logging_handler( "errorfile", fmt="{message}", mode="a"): elog = input_manager.err = logging.getLogger("errorfile") elog.addHandler(handler) elog.propagate = False # collect input URLs input_manager.add_list(args.urls) if args.input_files: for input_file, action in args.input_files: try: path = util.expand_path(input_file) input_manager.add_file(path, action) except Exception as exc: input_log.error(exc) return getattr(exc, "code", 128) pformat = config.get(("output",), "progress", True) if pformat and len(input_manager.urls) > 1 and \ args.loglevel < logging.ERROR: input_manager.progress(pformat) if catmap := config.interpolate(("extractor",), "category-map"): if catmap == "compat": catmap = { "coomer" : "coomerparty", "kemono" : "kemonoparty", "schalenetwork": "koharu", "naver-blog" : "naver", "naver-chzzk" : "chzzk", "naver-webtoon": "naverwebtoon", "pixiv-novel" : "pixiv", "pixiv-novel:novel" : ("pixiv", "novel"), "pixiv-novel:user" : ("pixiv", "novel-user"), "pixiv-novel:series" : ("pixiv", "novel-series"), "pixiv-novel:bookmark": ("pixiv", "novel-bookmark"), } from .extractor import common common.CATEGORY_MAP = catmap # process input URLs retval = 0 for url in input_manager: try: log.debug("Starting %s for '%s'", jobtype.__name__, url) if isinstance(url, ExtendedUrl): for opts in url.gconfig: config.set(*opts) with config.apply(url.lconfig): status = jobtype(url.value).run() else: status = jobtype(url).run() if status: retval |= status input_manager.error() else: input_manager.success() except exception.RestartExtraction: log.debug("Restarting '%s'", url) continue except exception.ControlException: pass except exception.NoExtractorError: log.error("Unsupported URL '%s'", url) retval |= 64 input_manager.error() input_manager.next() return retval return 0 except KeyboardInterrupt: raise SystemExit("\nKeyboardInterrupt") except BrokenPipeError: pass except OSError as exc: import errno if exc.errno != errno.EPIPE: raise return 1 class InputManager(): def __init__(self): self.urls = [] self.files = () self.log = self.err = None self._url = "" self._item = None self._index = 0 self._pformat = None def add_url(self, url): self.urls.append(url) def add_list(self, urls): self.urls += urls def add_file(self, path, action=None): """Process an input file. Lines starting with '#' and empty lines will be ignored. Lines starting with '-' will be interpreted as a key-value pair separated by an '='. where 'key' is a dot-separated option name and 'value' is a JSON-parsable string. These configuration options will be applied while processing the next URL only. Lines starting with '-G' are the same as above, except these options will be applied for *all* following URLs, i.e. they are Global. Everything else will be used as a potential URL. Example input file: # settings global options -G base-directory = "/tmp/" -G skip = false # setting local options for the next URL -filename="spaces_are_optional.jpg" -skip = true https://example.org/ # next URL uses default filename and 'skip' is false. https://example.com/index.htm # comment1 https://example.com/404.htm # comment2 """ if path == "-" and not action: try: lines = sys.stdin.readlines() except Exception: raise exception.InputFileError("stdin is not readable") path = None else: try: with open(path, encoding="utf-8") as fp: lines = fp.readlines() except Exception as exc: raise exception.InputFileError(str(exc)) if self.files: self.files[path] = lines else: self.files = {path: lines} if action == "c": action = self._action_comment elif action == "d": action = self._action_delete else: action = None gconf = [] lconf = [] indicies = [] strip_comment = None append = self.urls.append for n, line in enumerate(lines): line = line.strip() if not line or line[0] == "#": # empty line or comment continue elif line[0] == "-": # config spec if len(line) >= 2 and line[1] == "G": conf = gconf line = line[2:] else: conf = lconf line = line[1:] if action: indicies.append(n) key, sep, value = line.partition("=") if not sep: raise exception.InputFileError( f"Invalid KEY=VALUE pair '{line}' " f"on line {n+1} in {path}") try: value = util.json_loads(value.strip()) except ValueError as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) raise exception.InputFileError( f"Unable to parse '{value}' on line {n+1} in {path}") key = key.strip().split(".") conf.append((key[:-1], key[-1], value)) else: # url if " #" in line or "\t#" in line: if strip_comment is None: strip_comment = util.re(r"\s+#.*").sub line = strip_comment("", line) if gconf or lconf: url = ExtendedUrl(line, gconf, lconf) gconf = [] lconf = [] else: url = line if action: indicies.append(n) append((url, path, action, indicies)) indicies = [] else: append(url) def progress(self, pformat=True): if pformat is True: pformat = "[{current}/{total}] {url}\n" else: pformat += "\n" self._pformat = pformat.format_map def next(self): self._index += 1 def success(self): if self._item: self._rewrite() def error(self): if self.err: if self._item: url, path, action, indicies = self._item lines = self.files[path] out = "".join(lines[i] for i in indicies) if out and out[-1] == "\n": out = out[:-1] self._rewrite() else: out = str(self._url) self.err.info(out) def _rewrite(self): url, path, action, indicies = self._item lines = self.files[path] action(lines, indicies) try: with open(path, "w", encoding="utf-8") as fp: fp.writelines(lines) except Exception as exc: self.log.warning( "Unable to update '%s' (%s: %s)", path, exc.__class__.__name__, exc) def _action_comment(self, lines, indicies): for i in indicies: lines[i] = "# " + lines[i] def _action_delete(self, lines, indicies): for i in indicies: lines[i] = "" def __iter__(self): self._index = 0 return self def __next__(self): try: url = self.urls[self._index] except IndexError: raise StopIteration if isinstance(url, tuple): self._item = url url = url[0] else: self._item = None self._url = url if self._pformat: output.stderr_write(self._pformat({ "total" : len(self.urls), "current": self._index + 1, "url" : url, })) return url class ExtendedUrl(): """URL with attached config key-value pairs""" __slots__ = ("value", "gconfig", "lconfig") def __init__(self, url, gconf, lconf): self.value = url self.gconfig = gconf self.lconfig = lconf def __str__(self): return self.value ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/__main__.py0000644000175000017500000000105715040344700016617 0ustar00mikemike#!/usr/bin/env python3 # -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys if not __package__ and not hasattr(sys, "frozen"): import os.path path = os.path.realpath(os.path.abspath(__file__)) sys.path.insert(0, os.path.dirname(os.path.dirname(path))) import gallery_dl if __name__ == "__main__": raise SystemExit(gallery_dl.main()) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/actions.py0000644000175000017500000001651415040344700016543 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """ """ import time import logging import operator import functools from . import util, exception def parse_logging(actionspec): if isinstance(actionspec, dict): actionspec = actionspec.items() actions = {} actions[-logging.DEBUG] = actions_bd = [] actions[-logging.INFO] = actions_bi = [] actions[-logging.WARNING] = actions_bw = [] actions[-logging.ERROR] = actions_be = [] actions[logging.DEBUG] = actions_ad = [] actions[logging.INFO] = actions_ai = [] actions[logging.WARNING] = actions_aw = [] actions[logging.ERROR] = actions_ae = [] for event, spec in actionspec: level, _, pattern = event.partition(":") search = util.re(pattern).search if pattern else util.true if isinstance(spec, str): type, _, args = spec.partition(" ") before, after = ACTIONS[type](args) else: actions_before = [] actions_after = [] for s in spec: type, _, args = s.partition(" ") before, after = ACTIONS[type](args) if before: actions_before.append(before) if after: actions_after.append(after) before = _chain_actions(actions_before) after = _chain_actions(actions_after) level = level.strip() if not level or level == "*": if before: action = (search, before) actions_bd.append(action) actions_bi.append(action) actions_bw.append(action) actions_be.append(action) if after: action = (search, after) actions_ad.append(action) actions_ai.append(action) actions_aw.append(action) actions_ae.append(action) else: level = _level_to_int(level) if before: actions[-level].append((search, before)) if after: actions[level].append((search, after)) return actions def parse_signals(actionspec): import signal if isinstance(actionspec, dict): actionspec = actionspec.items() for signal_name, spec in actionspec: signal_num = getattr(signal, signal_name, None) if signal_num is None: log = logging.getLogger("gallery-dl") log.warning("signal '%s' is not defined", signal_name) continue if isinstance(spec, str): type, _, args = spec.partition(" ") before, after = ACTIONS[type](args) action = before if after is None else after else: actions_before = [] actions_after = [] for s in spec: type, _, args = s.partition(" ") before, after = ACTIONS[type](args) if before is not None: actions_before.append(before) if after is not None: actions_after.append(after) actions = actions_before actions.extend(actions_after) action = _chain_actions(actions) signal.signal(signal_num, signals_handler(action)) class LoggerAdapter(): def __init__(self, logger, job): self.logger = logger self.extra = job._logger_extra self.actions = job._logger_actions self.debug = functools.partial(self.log, logging.DEBUG) self.info = functools.partial(self.log, logging.INFO) self.warning = functools.partial(self.log, logging.WARNING) self.error = functools.partial(self.log, logging.ERROR) def log(self, level, msg, *args, **kwargs): msg = str(msg) if args: msg = msg % args before = self.actions[-level] after = self.actions[level] if before: args = self.extra.copy() args["level"] = level for cond, action in before: if cond(msg): action(args) level = args["level"] if self.logger.isEnabledFor(level): kwargs["extra"] = self.extra self.logger._log(level, msg, (), **kwargs) if after: args = self.extra.copy() for cond, action in after: if cond(msg): action(args) def _level_to_int(level): try: return logging._nameToLevel[level] except KeyError: return int(level) def _chain_actions(actions): def _chain(args): for action in actions: action(args) return _chain def signals_handler(action, args={}): def handler(signal_num, frame): action(args) return handler # -------------------------------------------------------------------- def action_print(opts): def _print(_): print(opts) return None, _print def action_status(opts): op, value = util.re(r"\s*([&|^=])=?\s*(\d+)").match(opts).groups() op = { "&": operator.and_, "|": operator.or_, "^": operator.xor, "=": lambda x, y: y, }[op] value = int(value) def _status(args): args["job"].status = op(args["job"].status, value) return _status, None def action_level(opts): level = _level_to_int(opts.lstrip(" ~=")) def _level(args): args["level"] = level return _level, None def action_exec(opts): def _exec(_): util.Popen(opts, shell=True).wait() return None, _exec def action_wait(opts): if opts: seconds = util.build_duration_func(opts) def _wait(args): time.sleep(seconds()) else: def _wait(args): input("Press Enter to continue") return None, _wait def action_flag(opts): flag, value = util.re( r"(?i)(file|post|child|download)(?:\s*[= ]\s*(.+))?" ).match(opts).groups() flag = flag.upper() value = "stop" if value is None else value.lower() def _flag(args): util.FLAGS.__dict__[flag] = value return _flag, None def action_raise(opts): name, _, arg = opts.partition(" ") exc = getattr(exception, name, None) if exc is None: import builtins exc = getattr(builtins, name, Exception) if arg: def _raise(args): raise exc(arg) else: def _raise(args): raise exc() return None, _raise def action_abort(opts): return None, util.raises(exception.StopExtraction) def action_terminate(opts): return None, util.raises(exception.TerminateExtraction) def action_restart(opts): return None, util.raises(exception.RestartExtraction) def action_exit(opts): try: opts = int(opts) except ValueError: pass def _exit(args): raise SystemExit(opts) return None, _exit ACTIONS = { "abort" : action_abort, "exec" : action_exec, "exit" : action_exit, "flag" : action_flag, "level" : action_level, "print" : action_print, "raise" : action_raise, "restart" : action_restart, "status" : action_status, "terminate": action_terminate, "wait" : action_wait, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/aes.py0000644000175000017500000005151715040344700015655 0ustar00mikemike# -*- coding: utf-8 -*- # This is a slightly modified version of yt-dlp's aes module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/aes.py import struct import binascii from math import ceil try: from Cryptodome.Cipher import AES as Cryptodome_AES except ImportError: try: from Crypto.Cipher import AES as Cryptodome_AES except ImportError: Cryptodome_AES = None except Exception as exc: Cryptodome_AES = None import logging logging.getLogger("aes").warning( "Error when trying to import 'Cryptodome' module (%s: %s)", exc.__class__.__name__, exc) del logging if Cryptodome_AES: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_CBC, iv).decrypt(data) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using pycryptodome""" return Cryptodome_AES.new( key, Cryptodome_AES.MODE_GCM, nonce).decrypt_and_verify(data, tag) else: def aes_cbc_decrypt_bytes(data, key, iv): """Decrypt bytes with AES-CBC using native implementation""" return intlist_to_bytes(aes_cbc_decrypt( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(iv), )) def aes_gcm_decrypt_and_verify_bytes(data, key, tag, nonce): """Decrypt bytes with AES-GCM using native implementation""" return intlist_to_bytes(aes_gcm_decrypt_and_verify( bytes_to_intlist(data), bytes_to_intlist(key), bytes_to_intlist(tag), bytes_to_intlist(nonce), )) bytes_to_intlist = list def intlist_to_bytes(xs): if not xs: return b"" return struct.pack(f"{len(xs)}B", *xs) def unpad_pkcs7(data): return data[:-data[-1]] BLOCK_SIZE_BYTES = 16 def aes_ecb_encrypt(data, key, iv=None): """ Encrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_encrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ecb_decrypt(data, key, iv=None): """ Decrypt with aes in ECB mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv Unused for this mode @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) encrypted_data = [] for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] encrypted_data += aes_decrypt(block, expanded_key) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_ctr_decrypt(data, key, iv): """ Decrypt with aes in counter mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} decrypted data """ return aes_ctr_encrypt(data, key, iv) def aes_ctr_encrypt(data, key, iv): """ Encrypt with aes in counter mode @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte initialization vector @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) counter = iter_vector(iv) encrypted_data = [] for i in range(block_count): counter_block = next(counter) block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) cipher_counter_block = aes_encrypt(counter_block, expanded_key) encrypted_data += xor(block, cipher_counter_block) encrypted_data = encrypted_data[:len(data)] return encrypted_data def aes_cbc_decrypt(data, key, iv): """ Decrypt with aes in CBC mode @param {int[]} data cipher @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} decrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) decrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] block += [0] * (BLOCK_SIZE_BYTES - len(block)) decrypted_block = aes_decrypt(block, expanded_key) decrypted_data += xor(decrypted_block, previous_cipher_block) previous_cipher_block = block decrypted_data = decrypted_data[:len(data)] return decrypted_data def aes_cbc_encrypt(data, key, iv): """ Encrypt with aes in CBC mode. Using PKCS#7 padding @param {int[]} data cleartext @param {int[]} key 16/24/32-Byte cipher key @param {int[]} iv 16-Byte IV @returns {int[]} encrypted data """ expanded_key = key_expansion(key) block_count = ceil(len(data) / BLOCK_SIZE_BYTES) encrypted_data = [] previous_cipher_block = iv for i in range(block_count): block = data[i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES] remaining_length = BLOCK_SIZE_BYTES - len(block) block += [remaining_length] * remaining_length mixed_block = xor(block, previous_cipher_block) encrypted_block = aes_encrypt(mixed_block, expanded_key) encrypted_data += encrypted_block previous_cipher_block = encrypted_block return encrypted_data def aes_gcm_decrypt_and_verify(data, key, tag, nonce): """ Decrypt with aes in GBM mode and checks authenticity using tag @param {int[]} data cipher @param {int[]} key 16-Byte cipher key @param {int[]} tag authentication tag @param {int[]} nonce IV (recommended 12-Byte) @returns {int[]} decrypted data """ # XXX: check aes, gcm param hash_subkey = aes_encrypt([0] * BLOCK_SIZE_BYTES, key_expansion(key)) if len(nonce) == 12: j0 = nonce + [0, 0, 0, 1] else: fill = (BLOCK_SIZE_BYTES - (len(nonce) % BLOCK_SIZE_BYTES)) % \ BLOCK_SIZE_BYTES + 8 ghash_in = nonce + [0] * fill + bytes_to_intlist( (8 * len(nonce)).to_bytes(8, "big")) j0 = ghash(hash_subkey, ghash_in) # TODO: add nonce support to aes_ctr_decrypt # nonce_ctr = j0[:12] iv_ctr = inc(j0) decrypted_data = aes_ctr_decrypt( data, key, iv_ctr + [0] * (BLOCK_SIZE_BYTES - len(iv_ctr))) pad_len = ( (BLOCK_SIZE_BYTES - (len(data) % BLOCK_SIZE_BYTES)) % BLOCK_SIZE_BYTES) s_tag = ghash( hash_subkey, data + [0] * pad_len + # pad bytes_to_intlist( (0 * 8).to_bytes(8, "big") + # length of associated data ((len(data) * 8).to_bytes(8, "big")) # length of data ) ) if tag != aes_ctr_encrypt(s_tag, key, j0): raise ValueError("Mismatching authentication tag") return decrypted_data def aes_encrypt(data, expanded_key): """ Encrypt one block with aes @param {int[]} data 16-Byte state @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte cipher """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) for i in range(1, rounds + 1): data = sub_bytes(data) data = shift_rows(data) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX)) data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) return data def aes_decrypt(data, expanded_key): """ Decrypt one block with aes @param {int[]} data 16-Byte cipher @param {int[]} expanded_key 176/208/240-Byte expanded key @returns {int[]} 16-Byte state """ rounds = len(expanded_key) // BLOCK_SIZE_BYTES - 1 for i in range(rounds, 0, -1): data = xor(data, expanded_key[ i * BLOCK_SIZE_BYTES: (i + 1) * BLOCK_SIZE_BYTES]) if i != rounds: data = list(iter_mix_columns(data, MIX_COLUMN_MATRIX_INV)) data = shift_rows_inv(data) data = sub_bytes_inv(data) data = xor(data, expanded_key[:BLOCK_SIZE_BYTES]) return data def aes_decrypt_text(data, password, key_size_bytes): """ Decrypt text - The first 8 Bytes of decoded 'data' are the 8 high Bytes of the counter - The cipher key is retrieved by encrypting the first 16 Byte of 'password' with the first 'key_size_bytes' Bytes from 'password' (if necessary filled with 0's) - Mode of operation is 'counter' @param {str} data Base64 encoded string @param {str,unicode} password Password (will be encoded with utf-8) @param {int} key_size_bytes Possible values: 16 for 128-Bit, 24 for 192-Bit, or 32 for 256-Bit @returns {str} Decrypted data """ NONCE_LENGTH_BYTES = 8 data = bytes_to_intlist(binascii.a2b_base64(data)) password = bytes_to_intlist(password.encode("utf-8")) key = password[:key_size_bytes] + [0] * (key_size_bytes - len(password)) key = aes_encrypt(key[:BLOCK_SIZE_BYTES], key_expansion(key)) * \ (key_size_bytes // BLOCK_SIZE_BYTES) nonce = data[:NONCE_LENGTH_BYTES] cipher = data[NONCE_LENGTH_BYTES:] return intlist_to_bytes(aes_ctr_decrypt( cipher, key, nonce + [0] * (BLOCK_SIZE_BYTES - NONCE_LENGTH_BYTES) )) RCON = ( 0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, ) SBOX = ( 0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76, 0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0, 0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15, 0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75, 0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84, 0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF, 0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8, 0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2, 0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73, 0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB, 0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79, 0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08, 0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A, 0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E, 0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF, 0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16, ) SBOX_INV = ( 0x52, 0x09, 0x6a, 0xd5, 0x30, 0x36, 0xa5, 0x38, 0xbf, 0x40, 0xa3, 0x9e, 0x81, 0xf3, 0xd7, 0xfb, 0x7c, 0xe3, 0x39, 0x82, 0x9b, 0x2f, 0xff, 0x87, 0x34, 0x8e, 0x43, 0x44, 0xc4, 0xde, 0xe9, 0xcb, 0x54, 0x7b, 0x94, 0x32, 0xa6, 0xc2, 0x23, 0x3d, 0xee, 0x4c, 0x95, 0x0b, 0x42, 0xfa, 0xc3, 0x4e, 0x08, 0x2e, 0xa1, 0x66, 0x28, 0xd9, 0x24, 0xb2, 0x76, 0x5b, 0xa2, 0x49, 0x6d, 0x8b, 0xd1, 0x25, 0x72, 0xf8, 0xf6, 0x64, 0x86, 0x68, 0x98, 0x16, 0xd4, 0xa4, 0x5c, 0xcc, 0x5d, 0x65, 0xb6, 0x92, 0x6c, 0x70, 0x48, 0x50, 0xfd, 0xed, 0xb9, 0xda, 0x5e, 0x15, 0x46, 0x57, 0xa7, 0x8d, 0x9d, 0x84, 0x90, 0xd8, 0xab, 0x00, 0x8c, 0xbc, 0xd3, 0x0a, 0xf7, 0xe4, 0x58, 0x05, 0xb8, 0xb3, 0x45, 0x06, 0xd0, 0x2c, 0x1e, 0x8f, 0xca, 0x3f, 0x0f, 0x02, 0xc1, 0xaf, 0xbd, 0x03, 0x01, 0x13, 0x8a, 0x6b, 0x3a, 0x91, 0x11, 0x41, 0x4f, 0x67, 0xdc, 0xea, 0x97, 0xf2, 0xcf, 0xce, 0xf0, 0xb4, 0xe6, 0x73, 0x96, 0xac, 0x74, 0x22, 0xe7, 0xad, 0x35, 0x85, 0xe2, 0xf9, 0x37, 0xe8, 0x1c, 0x75, 0xdf, 0x6e, 0x47, 0xf1, 0x1a, 0x71, 0x1d, 0x29, 0xc5, 0x89, 0x6f, 0xb7, 0x62, 0x0e, 0xaa, 0x18, 0xbe, 0x1b, 0xfc, 0x56, 0x3e, 0x4b, 0xc6, 0xd2, 0x79, 0x20, 0x9a, 0xdb, 0xc0, 0xfe, 0x78, 0xcd, 0x5a, 0xf4, 0x1f, 0xdd, 0xa8, 0x33, 0x88, 0x07, 0xc7, 0x31, 0xb1, 0x12, 0x10, 0x59, 0x27, 0x80, 0xec, 0x5f, 0x60, 0x51, 0x7f, 0xa9, 0x19, 0xb5, 0x4a, 0x0d, 0x2d, 0xe5, 0x7a, 0x9f, 0x93, 0xc9, 0x9c, 0xef, 0xa0, 0xe0, 0x3b, 0x4d, 0xae, 0x2a, 0xf5, 0xb0, 0xc8, 0xeb, 0xbb, 0x3c, 0x83, 0x53, 0x99, 0x61, 0x17, 0x2b, 0x04, 0x7e, 0xba, 0x77, 0xd6, 0x26, 0xe1, 0x69, 0x14, 0x63, 0x55, 0x21, 0x0c, 0x7d ) MIX_COLUMN_MATRIX = ( (0x2, 0x3, 0x1, 0x1), (0x1, 0x2, 0x3, 0x1), (0x1, 0x1, 0x2, 0x3), (0x3, 0x1, 0x1, 0x2), ) MIX_COLUMN_MATRIX_INV = ( (0xE, 0xB, 0xD, 0x9), (0x9, 0xE, 0xB, 0xD), (0xD, 0x9, 0xE, 0xB), (0xB, 0xD, 0x9, 0xE), ) RIJNDAEL_EXP_TABLE = ( 0x01, 0x03, 0x05, 0x0F, 0x11, 0x33, 0x55, 0xFF, 0x1A, 0x2E, 0x72, 0x96, 0xA1, 0xF8, 0x13, 0x35, 0x5F, 0xE1, 0x38, 0x48, 0xD8, 0x73, 0x95, 0xA4, 0xF7, 0x02, 0x06, 0x0A, 0x1E, 0x22, 0x66, 0xAA, 0xE5, 0x34, 0x5C, 0xE4, 0x37, 0x59, 0xEB, 0x26, 0x6A, 0xBE, 0xD9, 0x70, 0x90, 0xAB, 0xE6, 0x31, 0x53, 0xF5, 0x04, 0x0C, 0x14, 0x3C, 0x44, 0xCC, 0x4F, 0xD1, 0x68, 0xB8, 0xD3, 0x6E, 0xB2, 0xCD, 0x4C, 0xD4, 0x67, 0xA9, 0xE0, 0x3B, 0x4D, 0xD7, 0x62, 0xA6, 0xF1, 0x08, 0x18, 0x28, 0x78, 0x88, 0x83, 0x9E, 0xB9, 0xD0, 0x6B, 0xBD, 0xDC, 0x7F, 0x81, 0x98, 0xB3, 0xCE, 0x49, 0xDB, 0x76, 0x9A, 0xB5, 0xC4, 0x57, 0xF9, 0x10, 0x30, 0x50, 0xF0, 0x0B, 0x1D, 0x27, 0x69, 0xBB, 0xD6, 0x61, 0xA3, 0xFE, 0x19, 0x2B, 0x7D, 0x87, 0x92, 0xAD, 0xEC, 0x2F, 0x71, 0x93, 0xAE, 0xE9, 0x20, 0x60, 0xA0, 0xFB, 0x16, 0x3A, 0x4E, 0xD2, 0x6D, 0xB7, 0xC2, 0x5D, 0xE7, 0x32, 0x56, 0xFA, 0x15, 0x3F, 0x41, 0xC3, 0x5E, 0xE2, 0x3D, 0x47, 0xC9, 0x40, 0xC0, 0x5B, 0xED, 0x2C, 0x74, 0x9C, 0xBF, 0xDA, 0x75, 0x9F, 0xBA, 0xD5, 0x64, 0xAC, 0xEF, 0x2A, 0x7E, 0x82, 0x9D, 0xBC, 0xDF, 0x7A, 0x8E, 0x89, 0x80, 0x9B, 0xB6, 0xC1, 0x58, 0xE8, 0x23, 0x65, 0xAF, 0xEA, 0x25, 0x6F, 0xB1, 0xC8, 0x43, 0xC5, 0x54, 0xFC, 0x1F, 0x21, 0x63, 0xA5, 0xF4, 0x07, 0x09, 0x1B, 0x2D, 0x77, 0x99, 0xB0, 0xCB, 0x46, 0xCA, 0x45, 0xCF, 0x4A, 0xDE, 0x79, 0x8B, 0x86, 0x91, 0xA8, 0xE3, 0x3E, 0x42, 0xC6, 0x51, 0xF3, 0x0E, 0x12, 0x36, 0x5A, 0xEE, 0x29, 0x7B, 0x8D, 0x8C, 0x8F, 0x8A, 0x85, 0x94, 0xA7, 0xF2, 0x0D, 0x17, 0x39, 0x4B, 0xDD, 0x7C, 0x84, 0x97, 0xA2, 0xFD, 0x1C, 0x24, 0x6C, 0xB4, 0xC7, 0x52, 0xF6, 0x01, ) RIJNDAEL_LOG_TABLE = ( 0x00, 0x00, 0x19, 0x01, 0x32, 0x02, 0x1a, 0xc6, 0x4b, 0xc7, 0x1b, 0x68, 0x33, 0xee, 0xdf, 0x03, 0x64, 0x04, 0xe0, 0x0e, 0x34, 0x8d, 0x81, 0xef, 0x4c, 0x71, 0x08, 0xc8, 0xf8, 0x69, 0x1c, 0xc1, 0x7d, 0xc2, 0x1d, 0xb5, 0xf9, 0xb9, 0x27, 0x6a, 0x4d, 0xe4, 0xa6, 0x72, 0x9a, 0xc9, 0x09, 0x78, 0x65, 0x2f, 0x8a, 0x05, 0x21, 0x0f, 0xe1, 0x24, 0x12, 0xf0, 0x82, 0x45, 0x35, 0x93, 0xda, 0x8e, 0x96, 0x8f, 0xdb, 0xbd, 0x36, 0xd0, 0xce, 0x94, 0x13, 0x5c, 0xd2, 0xf1, 0x40, 0x46, 0x83, 0x38, 0x66, 0xdd, 0xfd, 0x30, 0xbf, 0x06, 0x8b, 0x62, 0xb3, 0x25, 0xe2, 0x98, 0x22, 0x88, 0x91, 0x10, 0x7e, 0x6e, 0x48, 0xc3, 0xa3, 0xb6, 0x1e, 0x42, 0x3a, 0x6b, 0x28, 0x54, 0xfa, 0x85, 0x3d, 0xba, 0x2b, 0x79, 0x0a, 0x15, 0x9b, 0x9f, 0x5e, 0xca, 0x4e, 0xd4, 0xac, 0xe5, 0xf3, 0x73, 0xa7, 0x57, 0xaf, 0x58, 0xa8, 0x50, 0xf4, 0xea, 0xd6, 0x74, 0x4f, 0xae, 0xe9, 0xd5, 0xe7, 0xe6, 0xad, 0xe8, 0x2c, 0xd7, 0x75, 0x7a, 0xeb, 0x16, 0x0b, 0xf5, 0x59, 0xcb, 0x5f, 0xb0, 0x9c, 0xa9, 0x51, 0xa0, 0x7f, 0x0c, 0xf6, 0x6f, 0x17, 0xc4, 0x49, 0xec, 0xd8, 0x43, 0x1f, 0x2d, 0xa4, 0x76, 0x7b, 0xb7, 0xcc, 0xbb, 0x3e, 0x5a, 0xfb, 0x60, 0xb1, 0x86, 0x3b, 0x52, 0xa1, 0x6c, 0xaa, 0x55, 0x29, 0x9d, 0x97, 0xb2, 0x87, 0x90, 0x61, 0xbe, 0xdc, 0xfc, 0xbc, 0x95, 0xcf, 0xcd, 0x37, 0x3f, 0x5b, 0xd1, 0x53, 0x39, 0x84, 0x3c, 0x41, 0xa2, 0x6d, 0x47, 0x14, 0x2a, 0x9e, 0x5d, 0x56, 0xf2, 0xd3, 0xab, 0x44, 0x11, 0x92, 0xd9, 0x23, 0x20, 0x2e, 0x89, 0xb4, 0x7c, 0xb8, 0x26, 0x77, 0x99, 0xe3, 0xa5, 0x67, 0x4a, 0xed, 0xde, 0xc5, 0x31, 0xfe, 0x18, 0x0d, 0x63, 0x8c, 0x80, 0xc0, 0xf7, 0x70, 0x07, ) def key_expansion(data): """ Generate key schedule @param {int[]} data 16/24/32-Byte cipher key @returns {int[]} 176/208/240-Byte expanded key """ data = data[:] # copy rcon_iteration = 1 key_size_bytes = len(data) expanded_key_size_bytes = (key_size_bytes // 4 + 7) * BLOCK_SIZE_BYTES while len(data) < expanded_key_size_bytes: temp = data[-4:] temp = key_schedule_core(temp, rcon_iteration) rcon_iteration += 1 data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) if key_size_bytes == 32: temp = data[-4:] temp = sub_bytes(temp) data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) for _ in range(3 if key_size_bytes == 32 else 2 if key_size_bytes == 24 else 0): temp = data[-4:] data += xor(temp, data[-key_size_bytes: 4 - key_size_bytes]) data = data[:expanded_key_size_bytes] return data def iter_vector(iv): while True: yield iv iv = inc(iv) def sub_bytes(data): return [SBOX[x] for x in data] def sub_bytes_inv(data): return [SBOX_INV[x] for x in data] def rotate(data): return data[1:] + [data[0]] def key_schedule_core(data, rcon_iteration): data = rotate(data) data = sub_bytes(data) data[0] = data[0] ^ RCON[rcon_iteration] return data def xor(data1, data2): return [x ^ y for x, y in zip(data1, data2)] def iter_mix_columns(data, matrix): for i in (0, 4, 8, 12): for row in matrix: mixed = 0 for j in range(4): if data[i:i + 4][j] == 0 or row[j] == 0: mixed ^= 0 else: mixed ^= RIJNDAEL_EXP_TABLE[ (RIJNDAEL_LOG_TABLE[data[i + j]] + RIJNDAEL_LOG_TABLE[row[j]]) % 0xFF ] yield mixed def shift_rows(data): return [ data[((column + row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_rows_inv(data): return [ data[((column - row) & 0b11) * 4 + row] for column in range(4) for row in range(4) ] def shift_block(data): data_shifted = [] bit = 0 for n in data: if bit: n |= 0x100 bit = n & 1 n >>= 1 data_shifted.append(n) return data_shifted def inc(data): data = data[:] # copy for i in range(len(data) - 1, -1, -1): if data[i] == 255: data[i] = 0 else: data[i] = data[i] + 1 break return data def block_product(block_x, block_y): # NIST SP 800-38D, Algorithm 1 if len(block_x) != BLOCK_SIZE_BYTES or len(block_y) != BLOCK_SIZE_BYTES: raise ValueError( f"Length of blocks need to be {BLOCK_SIZE_BYTES} bytes") block_r = [0xE1] + [0] * (BLOCK_SIZE_BYTES - 1) block_v = block_y[:] block_z = [0] * BLOCK_SIZE_BYTES for i in block_x: for bit in range(7, -1, -1): if i & (1 << bit): block_z = xor(block_z, block_v) do_xor = block_v[-1] & 1 block_v = shift_block(block_v) if do_xor: block_v = xor(block_v, block_r) return block_z def ghash(subkey, data): # NIST SP 800-38D, Algorithm 2 if len(data) % BLOCK_SIZE_BYTES: raise ValueError( f"Length of data should be {BLOCK_SIZE_BYTES} bytes") last_y = [0] * BLOCK_SIZE_BYTES for i in range(0, len(data), BLOCK_SIZE_BYTES): block = data[i: i + BLOCK_SIZE_BYTES] last_y = block_product(xor(last_y, block), subkey) return last_y ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/archive.py0000644000175000017500000001720015040344700016515 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Download Archives""" import os import logging from . import util, formatter log = logging.getLogger("archive") def connect(path, prefix, format, table=None, mode=None, pragma=None, kwdict=None, cache_key=None): keygen = formatter.parse(prefix + format).format_map if isinstance(path, str) and path.startswith( ("postgres://", "postgresql://")): if mode == "memory": cls = DownloadArchivePostgresqlMemory else: cls = DownloadArchivePostgresql else: path = util.expand_path(path) if kwdict is not None and "{" in path: path = formatter.parse(path).format_map(kwdict) if mode == "memory": cls = DownloadArchiveMemory else: cls = DownloadArchive if kwdict is not None and table: table = formatter.parse(table).format_map(kwdict) return cls(path, keygen, table, pragma, cache_key) def sanitize(name): return '"' + name.replace('"', "_") + '"' class DownloadArchive(): _sqlite3 = None def __init__(self, path, keygen, table=None, pragma=None, cache_key=None): if self._sqlite3 is None: DownloadArchive._sqlite3 = __import__("sqlite3") try: con = self._sqlite3.connect( path, timeout=60, check_same_thread=False) except self._sqlite3.OperationalError: os.makedirs(os.path.dirname(path)) con = self._sqlite3.connect( path, timeout=60, check_same_thread=False) con.isolation_level = None self.keygen = keygen self.connection = con self.close = con.close self.cursor = cursor = con.cursor() self._cache_key = cache_key or "_archive_key" table = "archive" if table is None else sanitize(table) self._stmt_select = ( "SELECT 1 " "FROM " + table + " " "WHERE entry=? " "LIMIT 1") self._stmt_insert = ( "INSERT OR IGNORE INTO " + table + " " "(entry) VALUES (?)") if pragma: for stmt in pragma: cursor.execute("PRAGMA " + stmt) try: cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " " "(entry TEXT PRIMARY KEY) WITHOUT ROWID") except self._sqlite3.OperationalError: # fallback for missing WITHOUT ROWID support (#553) cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " " "(entry TEXT PRIMARY KEY)") def add(self, kwdict): """Add item described by 'kwdict' to archive""" key = kwdict.get(self._cache_key) or self.keygen(kwdict) self.cursor.execute(self._stmt_insert, (key,)) def check(self, kwdict): """Return True if the item described by 'kwdict' exists in archive""" key = kwdict[self._cache_key] = self.keygen(kwdict) self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() def finalize(self): pass class DownloadArchiveMemory(DownloadArchive): def __init__(self, path, keygen, table=None, pragma=None, cache_key=None): DownloadArchive.__init__( self, path, keygen, table, pragma, cache_key) self.keys = set() def add(self, kwdict): self.keys.add( kwdict.get(self._cache_key) or self.keygen(kwdict)) def check(self, kwdict): key = kwdict[self._cache_key] = self.keygen(kwdict) if key in self.keys: return True self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() def finalize(self): if not self.keys: return cursor = self.cursor with self.connection: try: cursor.execute("BEGIN") except self._sqlite3.OperationalError: pass stmt = self._stmt_insert if len(self.keys) < 100: for key in self.keys: cursor.execute(stmt, (key,)) else: cursor.executemany(stmt, ((key,) for key in self.keys)) class DownloadArchivePostgresql(): _psycopg = None def __init__(self, uri, keygen, table=None, pragma=None, cache_key=None): if self._psycopg is None: DownloadArchivePostgresql._psycopg = __import__("psycopg") self.connection = con = self._psycopg.connect(uri) self.cursor = cursor = con.cursor() self.close = con.close self.keygen = keygen self._cache_key = cache_key or "_archive_key" table = "archive" if table is None else sanitize(table) self._stmt_select = ( "SELECT true " "FROM " + table + " " "WHERE entry=%s " "LIMIT 1") self._stmt_insert = ( "INSERT INTO " + table + " (entry) " "VALUES (%s) " "ON CONFLICT DO NOTHING") try: cursor.execute("CREATE TABLE IF NOT EXISTS " + table + " " "(entry TEXT PRIMARY KEY)") con.commit() except Exception as exc: log.error("%s: %s when creating '%s' table: %s", con, exc.__class__.__name__, table, exc) con.rollback() raise def add(self, kwdict): key = kwdict.get(self._cache_key) or self.keygen(kwdict) try: self.cursor.execute(self._stmt_insert, (key,)) self.connection.commit() except Exception as exc: log.error("%s: %s when writing entry: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() def check(self, kwdict): key = kwdict[self._cache_key] = self.keygen(kwdict) try: self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() except Exception as exc: log.error("%s: %s when checking entry: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() return False def finalize(self): pass class DownloadArchivePostgresqlMemory(DownloadArchivePostgresql): def __init__(self, path, keygen, table=None, pragma=None, cache_key=None): DownloadArchivePostgresql.__init__( self, path, keygen, table, pragma, cache_key) self.keys = set() def add(self, kwdict): self.keys.add( kwdict.get(self._cache_key) or self.keygen(kwdict)) def check(self, kwdict): key = kwdict[self._cache_key] = self.keygen(kwdict) if key in self.keys: return True try: self.cursor.execute(self._stmt_select, (key,)) return self.cursor.fetchone() except Exception as exc: log.error("%s: %s when checking entry: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() return False def finalize(self): if not self.keys: return try: self.cursor.executemany( self._stmt_insert, ((key,) for key in self.keys)) self.connection.commit() except Exception as exc: log.error("%s: %s when writing entries: %s", self.connection, exc.__class__.__name__, exc) self.connection.rollback() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/cache.py0000644000175000017500000001446515040344700016151 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Decorators to keep function results in an in-memory and database cache""" import sqlite3 import pickle import time import os import functools from . import config, util class CacheDecorator(): """Simplified in-memory cache""" def __init__(self, func, keyarg): self.func = func self.cache = {} self.keyarg = keyarg def __get__(self, instance, cls): return functools.partial(self.__call__, instance) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] try: value = self.cache[key] except KeyError: value = self.cache[key] = self.func(*args, **kwargs) return value def update(self, key, value): self.cache[key] = value def invalidate(self, key=""): try: del self.cache[key] except KeyError: pass class MemoryCacheDecorator(CacheDecorator): """In-memory cache""" def __init__(self, func, keyarg, maxage): CacheDecorator.__init__(self, func, keyarg) self.maxage = maxage def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) try: value, expires = self.cache[key] except KeyError: expires = 0 if expires <= timestamp: value = self.func(*args, **kwargs) expires = timestamp + self.maxage self.cache[key] = value, expires return value def update(self, key, value): self.cache[key] = value, int(time.time()) + self.maxage class DatabaseCacheDecorator(): """Database cache""" db = None _init = True def __init__(self, func, keyarg, maxage): self.key = f"{func.__module__}.{func.__name__}" self.func = func self.cache = {} self.keyarg = keyarg self.maxage = maxage def __get__(self, obj, objtype): return functools.partial(self.__call__, obj) def __call__(self, *args, **kwargs): key = "" if self.keyarg is None else args[self.keyarg] timestamp = int(time.time()) # in-memory cache lookup try: value, expires = self.cache[key] if expires > timestamp: return value except KeyError: pass # database lookup fullkey = f"{self.key}-{key}" with self.database() as db: cursor = db.cursor() try: cursor.execute("BEGIN EXCLUSIVE") except sqlite3.OperationalError: pass # Silently swallow exception - workaround for Python 3.6 cursor.execute( "SELECT value, expires FROM data WHERE key=? LIMIT 1", (fullkey,), ) result = cursor.fetchone() if result and result[1] > timestamp: value, expires = result value = pickle.loads(value) else: value = self.func(*args, **kwargs) expires = timestamp + self.maxage cursor.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", (fullkey, pickle.dumps(value), expires), ) self.cache[key] = value, expires return value def update(self, key, value): expires = int(time.time()) + self.maxage self.cache[key] = value, expires with self.database() as db: db.execute( "INSERT OR REPLACE INTO data VALUES (?,?,?)", (f"{self.key}-{key}", pickle.dumps(value), expires), ) def invalidate(self, key): try: del self.cache[key] except KeyError: pass with self.database() as db: db.execute( "DELETE FROM data WHERE key=?", (f"{self.key}-{key}",), ) def database(self): if self._init: self.db.execute( "CREATE TABLE IF NOT EXISTS data " "(key TEXT PRIMARY KEY, value TEXT, expires INTEGER)" ) DatabaseCacheDecorator._init = False return self.db def memcache(maxage=None, keyarg=None): if maxage: def wrap(func): return MemoryCacheDecorator(func, keyarg, maxage) else: def wrap(func): return CacheDecorator(func, keyarg) return wrap def cache(maxage=3600, keyarg=None): def wrap(func): return DatabaseCacheDecorator(func, keyarg, maxage) return wrap def clear(module): """Delete database entries for 'module'""" db = DatabaseCacheDecorator.db if not db: return None rowcount = 0 cursor = db.cursor() try: if module == "ALL": cursor.execute("DELETE FROM data") else: cursor.execute( "DELETE FROM data " "WHERE key LIKE 'gallery_dl.extractor.' || ? || '.%'", (module.lower(),) ) except sqlite3.OperationalError: pass # database not initialized, cannot be modified, etc. else: rowcount = cursor.rowcount db.commit() if rowcount: cursor.execute("VACUUM") return rowcount def _path(): path = config.get(("cache",), "file", util.SENTINEL) if path is not util.SENTINEL: return util.expand_path(path) if util.WINDOWS: cachedir = os.environ.get("APPDATA", "~") else: cachedir = os.environ.get("XDG_CACHE_HOME", "~/.cache") cachedir = util.expand_path(os.path.join(cachedir, "gallery-dl")) os.makedirs(cachedir, exist_ok=True) return os.path.join(cachedir, "cache.sqlite3") def _init(): try: dbfile = _path() # restrict access permissions for new db files os.close(os.open(dbfile, os.O_CREAT | os.O_RDONLY, 0o600)) DatabaseCacheDecorator.db = sqlite3.connect( dbfile, timeout=60, check_same_thread=False) except (OSError, TypeError, sqlite3.OperationalError): global cache cache = memcache _init() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/config.py0000644000175000017500000002156515040344700016352 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Global configuration module""" import sys import os.path import logging from . import util log = logging.getLogger("config") # -------------------------------------------------------------------- # internals _config = {} _files = [] if util.WINDOWS: _default_configs = [ r"%APPDATA%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl\config.json", r"%USERPROFILE%\gallery-dl.conf", ] else: _default_configs = [ "/etc/gallery-dl.conf", "${XDG_CONFIG_HOME}/gallery-dl/config.json" if os.environ.get("XDG_CONFIG_HOME") else "${HOME}/.config/gallery-dl/config.json", "${HOME}/.gallery-dl.conf", ] if util.EXECUTABLE: # look for config file in PyInstaller executable directory (#682) _default_configs.append(os.path.join( os.path.dirname(sys.executable), "gallery-dl.conf", )) # -------------------------------------------------------------------- # public interface def initialize(): paths = list(map(util.expand_path, _default_configs)) for path in paths: if os.access(path, os.R_OK | os.W_OK): log.error("There is already a configuration file at '%s'", path) return 1 for path in paths: try: os.makedirs(os.path.dirname(path), exist_ok=True) with open(path, "x", encoding="utf-8") as fp: fp.write("""\ { "extractor": { }, "downloader": { }, "output": { }, "postprocessor": { } } """) break except OSError as exc: log.debug("%s: %s", exc.__class__.__name__, exc) else: log.error("Unable to create a new configuration file " "at any of the default paths") return 1 log.info("Created a basic configuration file at '%s'", path) return 0 def open_extern(): for path in _default_configs: path = util.expand_path(path) if os.access(path, os.R_OK | os.W_OK): break else: log.warning("Unable to find any writable configuration file") return 1 if util.WINDOWS: openers = ("explorer", "notepad") else: openers = ("xdg-open", "open") if editor := os.environ.get("EDITOR"): openers = (editor,) + openers import shutil for opener in openers: if opener := shutil.which(opener): break else: log.warning("Unable to find a program to open '%s' with", path) return 1 log.info("Running '%s %s'", opener, path) retcode = util.Popen((opener, path)).wait() if not retcode: try: with open(path, encoding="utf-8") as fp: util.json_loads(fp.read()) except Exception as exc: log.warning("%s when parsing '%s': %s", exc.__class__.__name__, path, exc) return 2 return retcode def status(): from .output import stdout_write paths = [] for path in _default_configs: path = util.expand_path(path) try: with open(path, encoding="utf-8") as fp: util.json_loads(fp.read()) except FileNotFoundError: status = "Not Present" except OSError: status = "Inaccessible" except ValueError: status = "Invalid JSON" except Exception as exc: log.debug(exc) status = "Unknown" else: status = "OK" paths.append((path, status)) fmt = f"{{:<{max(len(p[0]) for p in paths)}}} : {{}}\n".format for path, status in paths: stdout_write(fmt(path, status)) def remap_categories(): opts = _config.get("extractor") if not opts: return cmap = opts.get("config-map") if cmap is None: cmap = ( ("coomerparty" , "coomer"), ("kemonoparty" , "kemono"), ("koharu" , "schalenetwork"), ("naver" , "naver-blog"), ("chzzk" , "naver-chzzk"), ("naverwebtoon", "naver-webtoon"), ("pixiv" , "pixiv-novel"), ) elif not cmap: return elif isinstance(cmap, dict): cmap = cmap.items() for old, new in cmap: if old in opts and new not in opts: opts[new] = opts[old] def load(files=None, strict=False, loads=util.json_loads): """Load JSON configuration files""" for pathfmt in files or _default_configs: path = util.expand_path(pathfmt) try: with open(path, encoding="utf-8") as fp: conf = loads(fp.read()) except OSError as exc: if strict: log.error(exc) raise SystemExit(1) except Exception as exc: log.error("%s when loading '%s': %s", exc.__class__.__name__, path, exc) if strict: raise SystemExit(2) else: if not _config: _config.update(conf) else: util.combine_dict(_config, conf) _files.append(pathfmt) if "subconfigs" in conf: if subconfigs := conf["subconfigs"]: if isinstance(subconfigs, str): subconfigs = (subconfigs,) load(subconfigs, strict, loads) def clear(): """Reset configuration to an empty state""" _config.clear() def get(path, key, default=None, conf=_config): """Get the value of property 'key' or a default value""" try: for p in path: conf = conf[p] return conf[key] except Exception: return default def interpolate(path, key, default=None, conf=_config): """Interpolate the value of 'key'""" if key in conf: return conf[key] try: for p in path: conf = conf[p] if key in conf: default = conf[key] except Exception: pass return default def interpolate_common(common, paths, key, default=None, conf=_config): """Interpolate the value of 'key' using multiple 'paths' along a 'common' ancestor """ if key in conf: return conf[key] # follow the common path try: for p in common: conf = conf[p] if key in conf: default = conf[key] except Exception: return default # try all paths until a value is found value = util.SENTINEL for path in paths: c = conf try: for p in path: c = c[p] if key in c: value = c[key] except Exception: pass if value is not util.SENTINEL: return value return default def accumulate(path, key, conf=_config): """Accumulate the values of 'key' along 'path'""" result = [] try: if key in conf: if value := conf[key]: if isinstance(value, list): result.extend(value) else: result.append(value) for p in path: conf = conf[p] if key in conf: if value := conf[key]: if isinstance(value, list): result[:0] = value else: result.insert(0, value) except Exception: pass return result def set(path, key, value, conf=_config): """Set the value of property 'key' for this session""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} conf[key] = value def setdefault(path, key, value, conf=_config): """Set the value of property 'key' if it doesn't exist""" for p in path: try: conf = conf[p] except KeyError: conf[p] = conf = {} return conf.setdefault(key, value) def unset(path, key, conf=_config): """Unset the value of property 'key'""" try: for p in path: conf = conf[p] del conf[key] except Exception: pass class apply(): """Context Manager: apply a collection of key-value pairs""" def __init__(self, kvlist): self.original = [] self.kvlist = kvlist def __enter__(self): for path, key, value in self.kvlist: self.original.append((path, key, get(path, key, util.SENTINEL))) set(path, key, value) def __exit__(self, exc_type, exc_value, traceback): self.original.reverse() for path, key, value in self.original: if value is util.SENTINEL: unset(path, key) else: set(path, key, value) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/cookies.py0000644000175000017500000011542315040344700016536 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. # Adapted from yt-dlp's cookies module. # https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/cookies.py import binascii import ctypes import logging import os import shutil import sqlite3 import struct import subprocess import sys import tempfile from hashlib import pbkdf2_hmac from http.cookiejar import Cookie from . import aes, text, util SUPPORTED_BROWSERS_CHROMIUM = { "brave", "chrome", "chromium", "edge", "opera", "thorium", "vivaldi"} SUPPORTED_BROWSERS_FIREFOX = {"firefox", "librewolf", "zen"} SUPPORTED_BROWSERS = \ SUPPORTED_BROWSERS_CHROMIUM | SUPPORTED_BROWSERS_FIREFOX | {"safari"} logger = logging.getLogger("cookies") def load_cookies(browser_specification): browser_name, profile, keyring, container, domain = \ _parse_browser_specification(*browser_specification) if browser_name in SUPPORTED_BROWSERS_FIREFOX: return load_cookies_firefox(browser_name, profile, container, domain) elif browser_name == "safari": return load_cookies_safari(profile, domain) elif browser_name in SUPPORTED_BROWSERS_CHROMIUM: return load_cookies_chromium(browser_name, profile, keyring, domain) else: raise ValueError(f"unknown browser '{browser_name}'") def load_cookies_firefox(browser_name, profile=None, container=None, domain=None): path, container_id = _firefox_cookies_database(browser_name, profile, container) sql = ("SELECT name, value, host, path, isSecure, expiry " "FROM moz_cookies") conditions = [] parameters = [] if container_id is False: conditions.append("NOT INSTR(originAttributes,'userContextId=')") elif container_id: uid = f"%userContextId={container_id}" conditions.append("originAttributes LIKE ? OR originAttributes LIKE ?") parameters += (uid, uid + "&%") if domain: if domain[0] == ".": conditions.append("host == ? OR host LIKE ?") parameters += (domain[1:], "%" + domain) else: conditions.append("host == ? OR host == ?") parameters += (domain, "." + domain) if conditions: sql = f"{sql} WHERE ( {' ) AND ( '.join(conditions)} )" with DatabaseConnection(path) as db: cookies = [ Cookie( 0, name, value, None, False, domain, True if domain else False, domain[0] == "." if domain else False, path, True if path else False, secure, expires, False, None, None, {}, ) for name, value, domain, path, secure, expires in db.execute( sql, parameters) ] _log_info("Extracted %s cookies from %s", len(cookies), browser_name.capitalize()) return cookies def load_cookies_safari(profile=None, domain=None): """Ref.: https://github.com/libyal/dtformats/blob /main/documentation/Safari%20Cookies.asciidoc - This data appears to be out of date but the important parts of the database structure is the same - There are a few bytes here and there which are skipped during parsing """ with _safari_cookies_database() as fp: data = fp.read() page_sizes, body_start = _safari_parse_cookies_header(data) p = DataParser(data[body_start:]) cookies = [] for page_size in page_sizes: _safari_parse_cookies_page(p.read_bytes(page_size), cookies) _log_info("Extracted %s cookies from Safari", len(cookies)) return cookies def load_cookies_chromium(browser_name, profile=None, keyring=None, domain=None): config = _chromium_browser_settings(browser_name) path = _chromium_cookies_database(profile, config) _log_debug("Extracting cookies from %s", path) if domain: if domain[0] == ".": condition = " WHERE host_key == ? OR host_key LIKE ?" parameters = (domain[1:], "%" + domain) else: condition = " WHERE host_key == ? OR host_key == ?" parameters = (domain, "." + domain) else: condition = "" parameters = () with DatabaseConnection(path) as db: db.text_factory = bytes cursor = db.cursor() try: meta_version = int(cursor.execute( "SELECT value FROM meta WHERE key = 'version'").fetchone()[0]) except Exception as exc: _log_warning("Failed to get cookie database meta version (%s: %s)", exc.__class__.__name__, exc) meta_version = 0 try: rows = cursor.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, is_secure FROM cookies" + condition, parameters) except sqlite3.OperationalError: rows = cursor.execute( "SELECT host_key, name, value, encrypted_value, path, " "expires_utc, secure FROM cookies" + condition, parameters) failed_cookies = 0 unencrypted_cookies = 0 decryptor = _chromium_cookie_decryptor( config["directory"], config["keyring"], keyring, meta_version) cookies = [] for domain, name, value, enc_value, path, expires, secure in rows: if not value and enc_value: # encrypted value = decryptor.decrypt(enc_value) if value is None: failed_cookies += 1 continue else: value = value.decode() unencrypted_cookies += 1 if expires: # https://stackoverflow.com/a/43520042 expires = int(expires) // 1000000 - 11644473600 else: expires = None domain = domain.decode() path = path.decode() name = name.decode() cookies.append(Cookie( 0, name, value, None, False, domain, True if domain else False, domain[0] == "." if domain else False, path, True if path else False, secure, expires, False, None, None, {}, )) if failed_cookies > 0: failed_message = f" ({failed_cookies} could not be decrypted)" else: failed_message = "" _log_info("Extracted %s cookies from %s%s", len(cookies), browser_name.capitalize(), failed_message) counts = decryptor.cookie_counts counts["unencrypted"] = unencrypted_cookies _log_debug("version breakdown: %s", counts) return cookies # -------------------------------------------------------------------- # firefox def _firefox_cookies_database(browser_name, profile=None, container=None): if not profile: search_root = _firefox_browser_directory(browser_name) elif _is_path(profile): search_root = profile else: search_root = os.path.join( _firefox_browser_directory(browser_name), profile) path = _find_most_recently_used_file(search_root, "cookies.sqlite") if path is None: raise FileNotFoundError(f"Unable to find {browser_name.capitalize()} " f"cookies database in {search_root}") _log_debug("Extracting cookies from %s", path) if not container or container == "none": container_id = False _log_debug("Only loading cookies not belonging to any container") elif container == "all": container_id = None else: containers_path = os.path.join( os.path.dirname(path), "containers.json") try: with open(containers_path) as fp: identities = util.json_loads(fp.read())["identities"] except OSError: _log_error("Unable to read Firefox container database at '%s'", containers_path) raise except KeyError: identities = () for context in identities: if container == context.get("name") or container == text.extr( context.get("l10nID", ""), "userContext", ".label"): container_id = context["userContextId"] break else: raise ValueError(f"Unable to find Firefox container '{container}'") _log_debug("Only loading cookies from container '%s' (ID %s)", container, container_id) return path, container_id def _firefox_browser_directory(browser_name): join = os.path.join if sys.platform in ("win32", "cygwin"): appdata = os.path.expandvars("%APPDATA%") return { "firefox" : join(appdata, R"Mozilla\Firefox\Profiles"), "librewolf": join(appdata, R"librewolf\Profiles"), "zen" : join(appdata, R"zen\Profiles"), }[browser_name] elif sys.platform == "darwin": appdata = os.path.expanduser("~/Library/Application Support") return { "firefox" : join(appdata, R"Firefox/Profiles"), "librewolf": join(appdata, R"librewolf/Profiles"), "zen" : join(appdata, R"zen/Profiles"), }[browser_name] else: home = os.path.expanduser("~") return { "firefox" : join(home, R".mozilla/firefox"), "librewolf": join(home, R".librewolf"), "zen" : join(home, R".zen"), }[browser_name] # -------------------------------------------------------------------- # safari def _safari_cookies_database(): try: path = os.path.expanduser("~/Library/Cookies/Cookies.binarycookies") return open(path, "rb") except FileNotFoundError: _log_debug("Trying secondary cookie location") path = os.path.expanduser("~/Library/Containers/com.apple.Safari/Data" "/Library/Cookies/Cookies.binarycookies") return open(path, "rb") def _safari_parse_cookies_header(data): p = DataParser(data) p.expect_bytes(b"cook", "database signature") number_of_pages = p.read_uint(big_endian=True) page_sizes = [p.read_uint(big_endian=True) for _ in range(number_of_pages)] return page_sizes, p.cursor def _safari_parse_cookies_page(data, cookies, domain=None): p = DataParser(data) p.expect_bytes(b"\x00\x00\x01\x00", "page signature") number_of_cookies = p.read_uint() record_offsets = [p.read_uint() for _ in range(number_of_cookies)] if number_of_cookies == 0: _log_debug("Cookies page of size %s has no cookies", len(data)) return p.skip_to(record_offsets[0], "unknown page header field") for i, record_offset in enumerate(record_offsets): p.skip_to(record_offset, "space between records") record_length = _safari_parse_cookies_record( data[record_offset:], cookies, domain) p.read_bytes(record_length) p.skip_to_end("space in between pages") def _safari_parse_cookies_record(data, cookies, host=None): p = DataParser(data) record_size = p.read_uint() p.skip(4, "unknown record field 1") flags = p.read_uint() is_secure = True if (flags & 0x0001) else False p.skip(4, "unknown record field 2") domain_offset = p.read_uint() name_offset = p.read_uint() path_offset = p.read_uint() value_offset = p.read_uint() p.skip(8, "unknown record field 3") expiration_date = _mac_absolute_time_to_posix(p.read_double()) _creation_date = _mac_absolute_time_to_posix(p.read_double()) # noqa: F841 try: p.skip_to(domain_offset) domain = p.read_cstring() if host: if host[0] == ".": if host[1:] != domain and not domain.endswith(host): return record_size else: if host != domain and ("." + host) != domain: return record_size p.skip_to(name_offset) name = p.read_cstring() p.skip_to(path_offset) path = p.read_cstring() p.skip_to(value_offset) value = p.read_cstring() except UnicodeDecodeError: _log_warning("Failed to parse Safari cookie") return record_size p.skip_to(record_size, "space at the end of the record") cookies.append(Cookie( 0, name, value, None, False, domain, True if domain else False, domain[0] == "." if domain else False, path, True if path else False, is_secure, expiration_date, False, None, None, {}, )) return record_size # -------------------------------------------------------------------- # chromium def _chromium_cookies_database(profile, config): if profile is None: search_root = config["directory"] elif _is_path(profile): search_root = profile config["directory"] = (os.path.dirname(profile) if config["profiles"] else profile) elif config["profiles"]: search_root = os.path.join(config["directory"], profile) else: _log_warning("%s does not support profiles", config["browser"]) search_root = config["directory"] path = _find_most_recently_used_file(search_root, "Cookies") if path is None: raise FileNotFoundError(f"Unable to find {config['browser']} cookies " f"database in '{search_root}'") return path def _chromium_browser_settings(browser_name): # https://chromium.googlesource.com/chromium # /src/+/HEAD/docs/user_data_dir.md join = os.path.join if sys.platform in ("win32", "cygwin"): appdata_local = os.path.expandvars("%LOCALAPPDATA%") appdata_roaming = os.path.expandvars("%APPDATA%") browser_dir = { "brave" : join(appdata_local, R"BraveSoftware\Brave-Browser\User Data"), "chrome" : join(appdata_local, R"Google\Chrome\User Data"), "chromium": join(appdata_local, R"Chromium\User Data"), "edge" : join(appdata_local, R"Microsoft\Edge\User Data"), "opera" : join(appdata_roaming, R"Opera Software\Opera Stable"), "thorium" : join(appdata_local, R"Thorium\User Data"), "vivaldi" : join(appdata_local, R"Vivaldi\User Data"), }[browser_name] elif sys.platform == "darwin": appdata = os.path.expanduser("~/Library/Application Support") browser_dir = { "brave" : join(appdata, "BraveSoftware/Brave-Browser"), "chrome" : join(appdata, "Google/Chrome"), "chromium": join(appdata, "Chromium"), "edge" : join(appdata, "Microsoft Edge"), "opera" : join(appdata, "com.operasoftware.Opera"), "thorium" : join(appdata, "Thorium"), "vivaldi" : join(appdata, "Vivaldi"), }[browser_name] else: config = (os.environ.get("XDG_CONFIG_HOME") or os.path.expanduser("~/.config")) browser_dir = { "brave" : join(config, "BraveSoftware/Brave-Browser"), "chrome" : join(config, "google-chrome"), "chromium": join(config, "chromium"), "edge" : join(config, "microsoft-edge"), "opera" : join(config, "opera"), "thorium" : join(config, "Thorium"), "vivaldi" : join(config, "vivaldi"), }[browser_name] # Linux keyring names can be determined by snooping on dbus # while opening the browser in KDE: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" keyring_name = { "brave" : "Brave", "chrome" : "Chrome", "chromium": "Chromium", "edge" : "Microsoft Edge" if sys.platform == "darwin" else "Chromium", "opera" : "Opera" if sys.platform == "darwin" else "Chromium", "thorium" : "Thorium", "vivaldi" : "Vivaldi" if sys.platform == "darwin" else "Chrome", }[browser_name] browsers_without_profiles = {"opera"} return { "browser" : browser_name, "directory": browser_dir, "keyring" : keyring_name, "profiles" : browser_name not in browsers_without_profiles } def _chromium_cookie_decryptor( browser_root, browser_keyring_name, keyring=None, meta_version=0): if sys.platform in ("win32", "cygwin"): return WindowsChromiumCookieDecryptor( browser_root, meta_version) elif sys.platform == "darwin": return MacChromiumCookieDecryptor( browser_keyring_name, meta_version) else: return LinuxChromiumCookieDecryptor( browser_keyring_name, keyring, meta_version) class ChromiumCookieDecryptor: """ Overview: Linux: - cookies are either v10 or v11 - v10: AES-CBC encrypted with a fixed key - v11: AES-CBC encrypted with an OS protected key (keyring) - v11 keys can be stored in various places depending on the activate desktop environment [2] Mac: - cookies are either v10 or not v10 - v10: AES-CBC encrypted with an OS protected key (keyring) and more key derivation iterations than linux - not v10: "old data" stored as plaintext Windows: - cookies are either v10 or not v10 - v10: AES-GCM encrypted with a key which is encrypted with DPAPI - not v10: encrypted with DPAPI Sources: - [1] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/ - [2] https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_linux.cc - KeyStorageLinux::CreateService """ def decrypt(self, encrypted_value): raise NotImplementedError("Must be implemented by sub classes") @property def cookie_counts(self): raise NotImplementedError("Must be implemented by sub classes") class LinuxChromiumCookieDecryptor(ChromiumCookieDecryptor): def __init__(self, browser_keyring_name, keyring=None, meta_version=0): password = _get_linux_keyring_password(browser_keyring_name, keyring) self._empty_key = self.derive_key(b"") self._v10_key = self.derive_key(b"peanuts") self._v11_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "v11": 0, "other": 0} self._offset = (32 if meta_version >= 24 else 0) def derive_key(self, password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_linux.cc return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 value = _decrypt_aes_cbc(ciphertext, self._v10_key, self._offset) elif version == b"v11": self._cookie_counts["v11"] += 1 if self._v11_key is None: _log_warning("Unable to decrypt v11 cookies: no key found") return None value = _decrypt_aes_cbc(ciphertext, self._v11_key, self._offset) else: self._cookie_counts["other"] += 1 return None if value is None: value = _decrypt_aes_cbc(ciphertext, self._empty_key, self._offset) if value is None: _log_warning("Failed to decrypt cookie (AES-CBC)") return value class MacChromiumCookieDecryptor(ChromiumCookieDecryptor): def __init__(self, browser_keyring_name, meta_version=0): password = _get_mac_keyring_password(browser_keyring_name) self._v10_key = None if password is None else self.derive_key(password) self._cookie_counts = {"v10": 0, "other": 0} self._offset = (32 if meta_version >= 24 else 0) def derive_key(self, password): # values from # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return pbkdf2_sha1(password, salt=b"saltysalt", iterations=1003, key_length=16) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: _log_warning("Unable to decrypt v10 cookies: no key found") return None return _decrypt_aes_cbc(ciphertext, self._v10_key, self._offset) else: self._cookie_counts["other"] += 1 # other prefixes are considered "old data", # which were stored as plaintext # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_mac.mm return encrypted_value class WindowsChromiumCookieDecryptor(ChromiumCookieDecryptor): def __init__(self, browser_root, meta_version=0): self._v10_key = _get_windows_v10_key(browser_root) self._cookie_counts = {"v10": 0, "other": 0} self._offset = (32 if meta_version >= 24 else 0) @property def cookie_counts(self): return self._cookie_counts def decrypt(self, encrypted_value): version = encrypted_value[:3] ciphertext = encrypted_value[3:] if version == b"v10": self._cookie_counts["v10"] += 1 if self._v10_key is None: _log_warning("Unable to decrypt v10 cookies: no key found") return None # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc # kNonceLength nonce_length = 96 // 8 # boringssl # EVP_AEAD_AES_GCM_TAG_LEN authentication_tag_length = 16 raw_ciphertext = ciphertext nonce = raw_ciphertext[:nonce_length] ciphertext = raw_ciphertext[ nonce_length:-authentication_tag_length] authentication_tag = raw_ciphertext[-authentication_tag_length:] return _decrypt_aes_gcm( ciphertext, self._v10_key, nonce, authentication_tag, self._offset) else: self._cookie_counts["other"] += 1 # any other prefix means the data is DPAPI encrypted # https://chromium.googlesource.com/chromium/src/+/refs/heads # /main/components/os_crypt/os_crypt_win.cc return _decrypt_windows_dpapi(encrypted_value).decode() # -------------------------------------------------------------------- # keyring def _choose_linux_keyring(): """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.cc SelectBackend """ desktop_environment = _get_linux_desktop_environment(os.environ) _log_debug("Detected desktop environment: %s", desktop_environment) if desktop_environment == DE_KDE: return KEYRING_KWALLET if desktop_environment == DE_OTHER: return KEYRING_BASICTEXT return KEYRING_GNOMEKEYRING def _get_kwallet_network_wallet(): """ The name of the wallet used to store network passwords. https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/kwallet_dbus.cc KWalletDBus::NetworkWallet which does a dbus call to the following function: https://api.kde.org/frameworks/kwallet/html/classKWallet_1_1Wallet.html Wallet::NetworkWallet """ default_wallet = "kdewallet" try: proc, stdout = Popen_communicate( "dbus-send", "--session", "--print-reply=literal", "--dest=org.kde.kwalletd5", "/modules/kwalletd5", "org.kde.KWallet.networkWallet" ) if proc.returncode != 0: _log_warning("Failed to read NetworkWallet") return default_wallet else: network_wallet = stdout.decode().strip() _log_debug("NetworkWallet = '%s'", network_wallet) return network_wallet except Exception as exc: _log_warning("Error while obtaining NetworkWallet (%s: %s)", exc.__class__.__name__, exc) return default_wallet def _get_kwallet_password(browser_keyring_name): _log_debug("Using kwallet-query to obtain password from kwallet") if shutil.which("kwallet-query") is None: _log_error( "kwallet-query command not found. KWallet and kwallet-query " "must be installed to read from KWallet. kwallet-query should be " "included in the kwallet package for your distribution") return b"" network_wallet = _get_kwallet_network_wallet() try: proc, stdout = Popen_communicate( "kwallet-query", "--read-password", browser_keyring_name + " Safe Storage", "--folder", browser_keyring_name + " Keys", network_wallet, ) if proc.returncode != 0: _log_error(f"kwallet-query failed with return code " f"{proc.returncode}. Please consult the kwallet-query " f"man page for details") return b"" if stdout.lower().startswith(b"failed to read"): _log_debug("Failed to read password from kwallet. " "Using empty string instead") # This sometimes occurs in KDE because chrome does not check # hasEntry and instead just tries to read the value (which # kwallet returns "") whereas kwallet-query checks hasEntry. # To verify this: # dbus-monitor "interface="org.kde.KWallet"" "type=method_return" # while starting chrome. # This may be a bug, as the intended behaviour is to generate a # random password and store it, but that doesn't matter here. return b"" else: if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: _log_warning("Error when running kwallet-query (%s: %s)", exc.__class__.__name__, exc) return b"" def _get_gnome_keyring_password(browser_keyring_name): try: import secretstorage except ImportError: _log_error("'secretstorage' Python package not available") return b"" # Gnome keyring does not seem to organise keys in the same way as KWallet, # using `dbus-monitor` during startup, it can be observed that chromium # lists all keys and presumably searches for its key in the list. # It appears that we must do the same. # https://github.com/jaraco/keyring/issues/556 con = secretstorage.dbus_init() try: col = secretstorage.get_default_collection(con) label = browser_keyring_name + " Safe Storage" for item in col.get_all_items(): if item.get_label() == label: return item.get_secret() else: _log_error("Failed to read from GNOME keyring") return b"" finally: con.close() def _get_linux_keyring_password(browser_keyring_name, keyring): # Note: chrome/chromium can be run with the following flags # to determine which keyring backend it has chosen to use # - chromium --enable-logging=stderr --v=1 2>&1 | grep key_storage_ # # Chromium supports --password-store= # so the automatic detection will not be sufficient in all cases. if not keyring: keyring = _choose_linux_keyring() _log_debug("Chosen keyring: %s", keyring) if keyring == KEYRING_KWALLET: return _get_kwallet_password(browser_keyring_name) elif keyring == KEYRING_GNOMEKEYRING: return _get_gnome_keyring_password(browser_keyring_name) elif keyring == KEYRING_BASICTEXT: # when basic text is chosen, all cookies are stored as v10 # so no keyring password is required return None assert False, "Unknown keyring " + keyring def _get_mac_keyring_password(browser_keyring_name): _log_debug("Using find-generic-password to obtain " "password from OSX keychain") try: proc, stdout = Popen_communicate( "security", "find-generic-password", "-w", # write password to stdout "-a", browser_keyring_name, # match "account" "-s", browser_keyring_name + " Safe Storage", # match "service" ) if stdout[-1:] == b"\n": stdout = stdout[:-1] return stdout except Exception as exc: _log_warning("Error when using find-generic-password (%s: %s)", exc.__class__.__name__, exc) return None def _get_windows_v10_key(browser_root): path = _find_most_recently_used_file(browser_root, "Local State") if path is None: _log_error("Unable to find Local State file") return None _log_debug("Found Local State file at '%s'", path) with open(path, encoding="utf-8") as fp: data = util.json_loads(fp.read()) try: base64_key = data["os_crypt"]["encrypted_key"] except KeyError: _log_error("Unable to find encrypted key in Local State") return None encrypted_key = binascii.a2b_base64(base64_key) prefix = b"DPAPI" if not encrypted_key.startswith(prefix): _log_error("Invalid Local State key") return None return _decrypt_windows_dpapi(encrypted_key[len(prefix):]) # -------------------------------------------------------------------- # utility class ParserError(Exception): pass class DataParser: def __init__(self, data): self.cursor = 0 self._data = data def read_bytes(self, num_bytes): if num_bytes < 0: raise ParserError(f"invalid read of {num_bytes} bytes") end = self.cursor + num_bytes if end > len(self._data): raise ParserError("reached end of input") data = self._data[self.cursor:end] self.cursor = end return data def expect_bytes(self, expected_value, message): value = self.read_bytes(len(expected_value)) if value != expected_value: raise ParserError(f"unexpected value: {value} != {expected_value} " f"({message})") def read_uint(self, big_endian=False): data_format = ">I" if big_endian else " 0: _log_debug(f"Skipping {num_bytes} bytes ({description}): " f"{self.read_bytes(num_bytes)!r}") elif num_bytes < 0: raise ParserError(f"Invalid skip of {num_bytes} bytes") def skip_to(self, offset, description="unknown"): self.skip(offset - self.cursor, description) def skip_to_end(self, description="unknown"): self.skip_to(len(self._data), description) class DatabaseConnection(): def __init__(self, path): self.path = path self.database = None self.directory = None def __enter__(self): try: # https://www.sqlite.org/uri.html#the_uri_path path = self.path.replace("?", "%3f").replace("#", "%23") if util.WINDOWS: path = "/" + os.path.abspath(path) uri = f"file:{path}?mode=ro&immutable=1" self.database = sqlite3.connect( uri, uri=True, isolation_level=None, check_same_thread=False) return self.database except Exception as exc: _log_debug("Falling back to temporary database copy (%s: %s)", exc.__class__.__name__, exc) try: self.directory = tempfile.TemporaryDirectory(prefix="gallery-dl-") path_copy = os.path.join(self.directory.name, "copy.sqlite") shutil.copyfile(self.path, path_copy) self.database = sqlite3.connect( path_copy, isolation_level=None, check_same_thread=False) return self.database except BaseException: if self.directory: self.directory.cleanup() raise def __exit__(self, exc_type, exc_value, traceback): self.database.close() if self.directory: self.directory.cleanup() def Popen_communicate(*args): proc = util.Popen( args, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL) try: stdout, stderr = proc.communicate() except BaseException: # Including KeyboardInterrupt proc.kill() proc.wait() raise return proc, stdout """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.h - DesktopEnvironment """ DE_OTHER = "other" DE_CINNAMON = "cinnamon" DE_GNOME = "gnome" DE_KDE = "kde" DE_PANTHEON = "pantheon" DE_UNITY = "unity" DE_XFCE = "xfce" """ https://chromium.googlesource.com/chromium/src/+/refs/heads /main/components/os_crypt/key_storage_util_linux.h - SelectedLinuxBackend """ KEYRING_KWALLET = "kwallet" KEYRING_GNOMEKEYRING = "gnomekeyring" KEYRING_BASICTEXT = "basictext" SUPPORTED_KEYRINGS = {"kwallet", "gnomekeyring", "basictext"} def _get_linux_desktop_environment(env): """ Ref: https://chromium.googlesource.com/chromium/src/+/refs/heads /main/base/nix/xdg_util.cc - GetDesktopEnvironment """ xdg_current_desktop = env.get("XDG_CURRENT_DESKTOP") desktop_session = env.get("DESKTOP_SESSION") if xdg_current_desktop: xdg_current_desktop = (xdg_current_desktop.partition(":")[0] .strip().lower()) if xdg_current_desktop == "unity": if desktop_session and "gnome-fallback" in desktop_session: return DE_GNOME else: return DE_UNITY elif xdg_current_desktop == "gnome": return DE_GNOME elif xdg_current_desktop == "x-cinnamon": return DE_CINNAMON elif xdg_current_desktop == "kde": return DE_KDE elif xdg_current_desktop == "pantheon": return DE_PANTHEON elif xdg_current_desktop == "xfce": return DE_XFCE if desktop_session: if desktop_session in ("mate", "gnome"): return DE_GNOME if "kde" in desktop_session: return DE_KDE if "xfce" in desktop_session: return DE_XFCE if "GNOME_DESKTOP_SESSION_ID" in env: return DE_GNOME if "KDE_FULL_SESSION" in env: return DE_KDE return DE_OTHER def _mac_absolute_time_to_posix(timestamp): # 978307200 is timestamp of 2001-01-01 00:00:00 return 978307200 + int(timestamp) def pbkdf2_sha1(password, salt, iterations, key_length): return pbkdf2_hmac("sha1", password, salt, iterations, key_length) def _decrypt_aes_cbc(ciphertext, key, offset=0, initialization_vector=b" " * 16): plaintext = aes.unpad_pkcs7(aes.aes_cbc_decrypt_bytes( ciphertext, key, initialization_vector)) if offset: plaintext = plaintext[offset:] try: return plaintext.decode() except UnicodeDecodeError: return None def _decrypt_aes_gcm(ciphertext, key, nonce, authentication_tag, offset=0): try: plaintext = aes.aes_gcm_decrypt_and_verify_bytes( ciphertext, key, authentication_tag, nonce) if offset: plaintext = plaintext[offset:] return plaintext.decode() except UnicodeDecodeError: _log_warning("Failed to decrypt cookie (AES-GCM Unicode)") except ValueError: _log_warning("Failed to decrypt cookie (AES-GCM MAC)") return None def _decrypt_windows_dpapi(ciphertext): """ References: - https://docs.microsoft.com/en-us/windows /win32/api/dpapi/nf-dpapi-cryptunprotectdata """ from ctypes.wintypes import DWORD class DATA_BLOB(ctypes.Structure): _fields_ = [("cbData", DWORD), ("pbData", ctypes.POINTER(ctypes.c_char))] buffer = ctypes.create_string_buffer(ciphertext) blob_in = DATA_BLOB(ctypes.sizeof(buffer), buffer) blob_out = DATA_BLOB() ret = ctypes.windll.crypt32.CryptUnprotectData( ctypes.byref(blob_in), # pDataIn None, # ppszDataDescr: human readable description of pDataIn None, # pOptionalEntropy: salt? None, # pvReserved: must be NULL None, # pPromptStruct: information about prompts to display 0, # dwFlags ctypes.byref(blob_out) # pDataOut ) if not ret: _log_warning("Failed to decrypt cookie (DPAPI)") return None result = ctypes.string_at(blob_out.pbData, blob_out.cbData) ctypes.windll.kernel32.LocalFree(blob_out.pbData) return result def _find_most_recently_used_file(root, filename): # if the provided root points to an exact profile path # check if it contains the wanted filename first_choice = os.path.join(root, filename) if os.path.exists(first_choice): return first_choice # if there are multiple browser profiles, take the most recently used one paths = [] for curr_root, dirs, files in os.walk(root): for file in files: if file == filename: paths.append(os.path.join(curr_root, file)) if not paths: return None return max(paths, key=lambda path: os.lstat(path).st_mtime) def _is_path(value): return os.path.sep in value def _parse_browser_specification( browser, profile=None, keyring=None, container=None, domain=None): browser = browser.lower() if browser not in SUPPORTED_BROWSERS: raise ValueError(f"Unsupported browser '{browser}'") if keyring and keyring not in SUPPORTED_KEYRINGS: raise ValueError(f"Unsupported keyring '{keyring}'") if profile and _is_path(profile): profile = os.path.expanduser(profile) return browser, profile, keyring, container, domain _log_cache = set() _log_debug = logger.debug _log_info = logger.info def _log_warning(msg, *args): if msg not in _log_cache: _log_cache.add(msg) logger.warning(msg, *args) def _log_error(msg, *args): if msg not in _log_cache: _log_cache.add(msg) logger.error(msg, *args) ././@PaxHeader0000000000000000000000000000003300000000000010211 xustar0027 mtime=1753638554.050738 gallery_dl-1.30.2/gallery_dl/downloader/0000755000175000017500000000000015041463232016663 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510442.0 gallery_dl-1.30.2/gallery_dl/downloader/__init__.py0000644000175000017500000000176414772755652021030 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader modules""" modules = [ "http", "text", "ytdl", ] def find(scheme): """Return downloader class suitable for handling the given scheme""" try: return _cache[scheme] except KeyError: pass cls = None if scheme == "https": scheme = "http" if scheme in modules: # prevent unwanted imports try: module = __import__(scheme, globals(), None, (), 1) except ImportError: pass else: cls = module.__downloader__ if scheme == "http": _cache["http"] = _cache["https"] = cls else: _cache[scheme] = cls return cls # -------------------------------------------------------------------- # internals _cache = {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/downloader/common.py0000644000175000017500000000616515040344700020532 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by downloader modules.""" import os from .. import config, util _config = config._config class DownloaderBase(): """Base class for downloaders""" scheme = "" def __init__(self, job): extractor = job.extractor self.log = job.get_logger("downloader." + self.scheme) if opts := self._extractor_config(extractor): self.opts = opts self.config = self.config_opts self.out = job.out self.session = extractor.session self.part = self.config("part", True) self.partdir = self.config("part-directory") if self.partdir: self.partdir = util.expand_path(self.partdir) os.makedirs(self.partdir, exist_ok=True) proxies = self.config("proxy", util.SENTINEL) if proxies is util.SENTINEL: self.proxies = extractor._proxies else: self.proxies = util.build_proxy_map(proxies, self.log) def config(self, key, default=None): """Interpolate downloader config value for 'key'""" return config.interpolate(("downloader", self.scheme), key, default) def config_opts(self, key, default=None, conf=_config): if key in conf: return conf[key] value = self.opts.get(key, util.SENTINEL) if value is not util.SENTINEL: return value return config.interpolate(("downloader", self.scheme), key, default) def _extractor_config(self, extractor): path = extractor._cfgpath if not isinstance(path, list): return self._extractor_opts(path[1], path[2]) opts = {} for cat, sub in reversed(path): if popts := self._extractor_opts(cat, sub): opts.update(popts) return opts def _extractor_opts(self, category, subcategory): cfg = config.get(("extractor",), category) if not cfg: return None if copts := cfg.get(self.scheme): if subcategory in cfg: try: if sopts := cfg[subcategory].get(self.scheme): opts = copts.copy() opts.update(sopts) return opts except Exception: self._report_config_error(subcategory, cfg[subcategory]) return copts if subcategory in cfg: try: return cfg[subcategory].get(self.scheme) except Exception: self._report_config_error(subcategory, cfg[subcategory]) return None def _report_config_error(self, subcategory, value): config.log.warning("Subcategory '%s' set to '%s' instead of object", subcategory, util.json_dumps(value).strip('"')) def download(self, url, pathfmt): """Write data from 'url' into the file specified by 'pathfmt'""" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/downloader/http.py0000644000175000017500000004752415040344700020225 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Downloader module for http:// and https:// URLs""" import time import mimetypes from requests.exceptions import RequestException, ConnectionError, Timeout from .common import DownloaderBase from .. import text, util, output, exception from ssl import SSLError FLAGS = util.FLAGS class HttpDownloader(DownloaderBase): scheme = "http" def __init__(self, job): DownloaderBase.__init__(self, job) extractor = job.extractor self.downloading = False self.adjust_extension = self.config("adjust-extensions", True) self.chunk_size = self.config("chunk-size", 32768) self.metadata = extractor.config("http-metadata") self.progress = self.config("progress", 3.0) self.validate = self.config("validate", True) self.validate_html = self.config("validate-html", True) self.headers = self.config("headers") self.minsize = self.config("filesize-min") self.maxsize = self.config("filesize-max") self.retries = self.config("retries", extractor._retries) self.retry_codes = self.config("retry-codes", extractor._retry_codes) self.timeout = self.config("timeout", extractor._timeout) self.verify = self.config("verify", extractor._verify) self.mtime = self.config("mtime", True) self.rate = self.config("rate") interval_429 = self.config("sleep-429") if not self.config("consume-content", False): # this resets the underlying TCP connection, and therefore # if the program makes another request to the same domain, # a new connection (either TLS or plain TCP) must be made self.release_conn = lambda resp: resp.close() if self.retries < 0: self.retries = float("inf") if self.minsize: minsize = text.parse_bytes(self.minsize) if not minsize: self.log.warning( "Invalid minimum file size (%r)", self.minsize) self.minsize = minsize if self.maxsize: maxsize = text.parse_bytes(self.maxsize) if not maxsize: self.log.warning( "Invalid maximum file size (%r)", self.maxsize) self.maxsize = maxsize if isinstance(self.chunk_size, str): chunk_size = text.parse_bytes(self.chunk_size) if not chunk_size: self.log.warning( "Invalid chunk size (%r)", self.chunk_size) chunk_size = 32768 self.chunk_size = chunk_size if self.rate: func = util.build_selection_func(self.rate, 0, text.parse_bytes) if rmax := func.args[1] if hasattr(func, "args") else func(): if rmax < self.chunk_size: # reduce chunk_size to allow for one iteration each second self.chunk_size = rmax self.rate = func self.receive = self._receive_rate else: self.log.warning("Invalid rate limit (%r)", self.rate) self.rate = False if self.progress is not None: self.receive = self._receive_rate if self.progress < 0.0: self.progress = 0.0 if interval_429 is None: self.interval_429 = extractor._interval_429 else: self.interval_429 = util.build_duration_func(interval_429) def download(self, url, pathfmt): try: return self._download_impl(url, pathfmt) except Exception as exc: if self.downloading: output.stderr_write("\n") self.log.debug("", exc_info=exc) raise finally: # remove file from incomplete downloads if self.downloading and not self.part: util.remove_file(pathfmt.temppath) def _download_impl(self, url, pathfmt): response = None tries = code = 0 msg = "" metadata = self.metadata kwdict = pathfmt.kwdict expected_status = kwdict.get( "_http_expected_status", ()) adjust_extension = kwdict.get( "_http_adjust_extension", self.adjust_extension) if self.part and not metadata: pathfmt.part_enable(self.partdir) while True: if tries: if response: self.release_conn(response) response = None self.log.warning("%s (%s/%s)", msg, tries, self.retries+1) if tries > self.retries: return False if code == 429 and self.interval_429: s = self.interval_429() time.sleep(s if s > tries else tries) else: time.sleep(tries) code = 0 tries += 1 file_header = None # collect HTTP headers headers = {"Accept": "*/*"} # file-specific headers if extra := kwdict.get("_http_headers"): headers.update(extra) # general headers if self.headers: headers.update(self.headers) # partial content if file_size := pathfmt.part_size(): headers["Range"] = f"bytes={file_size}-" # connect to (remote) source try: response = self.session.request( kwdict.get("_http_method", "GET"), url, stream=True, headers=headers, data=kwdict.get("_http_data"), timeout=self.timeout, proxies=self.proxies, verify=self.verify, ) except ConnectionError as exc: try: reason = exc.args[0].reason cls = reason.__class__.__name__ pre, _, err = str(reason.args[-1]).partition(":") msg = f"{cls}: {(err or pre).lstrip()}" except Exception: msg = str(exc) continue except Timeout as exc: msg = str(exc) continue except Exception as exc: self.log.warning(exc) return False # check response code = response.status_code if code == 200 or code in expected_status: # OK offset = 0 size = response.headers.get("Content-Length") elif code == 206: # Partial Content offset = file_size size = response.headers["Content-Range"].rpartition("/")[2] elif code == 416 and file_size: # Requested Range Not Satisfiable break else: msg = f"'{code} {response.reason}' for '{url}'" challenge = util.detect_challenge(response) if challenge is not None: self.log.warning(challenge) if code in self.retry_codes or 500 <= code < 600: continue retry = kwdict.get("_http_retry") if retry and retry(response): continue self.release_conn(response) self.log.warning(msg) return False # check for invalid responses if self.validate and \ (validate := kwdict.get("_http_validate")) is not None: try: result = validate(response) except Exception: self.release_conn(response) raise if isinstance(result, str): url = result tries -= 1 continue if not result: self.release_conn(response) self.log.warning("Invalid response") return False if self.validate_html and response.headers.get( "content-type", "").startswith("text/html") and \ pathfmt.extension not in ("html", "htm"): if response.history: self.log.warning("HTTP redirect to '%s'", response.url) else: self.log.warning("HTML response") return False # check file size size = text.parse_int(size, None) if size is not None: if self.minsize and size < self.minsize: self.release_conn(response) self.log.warning( "File size smaller than allowed minimum (%s < %s)", size, self.minsize) pathfmt.temppath = "" return True if self.maxsize and size > self.maxsize: self.release_conn(response) self.log.warning( "File size larger than allowed maximum (%s > %s)", size, self.maxsize) pathfmt.temppath = "" return True build_path = False # set missing filename extension from MIME type if not pathfmt.extension: pathfmt.set_extension(self._find_extension(response)) build_path = True # set metadata from HTTP headers if metadata: kwdict[metadata] = util.extract_headers(response) build_path = True # build and check file path if build_path: pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" # release the connection back to pool by explicitly # calling .close() # see https://requests.readthedocs.io/en/latest/user # /advanced/#body-content-workflow # when the image size is on the order of megabytes, # re-establishing a TLS connection will typically be faster # than consuming the whole response response.close() return True if self.part and metadata: pathfmt.part_enable(self.partdir) metadata = False content = response.iter_content(self.chunk_size) validate_sig = kwdict.get("_http_signature") validate_ext = (adjust_extension and pathfmt.extension in SIGNATURE_CHECKS) # check filename extension against file header if not offset and (validate_ext or validate_sig): try: file_header = next( content if response.raw.chunked else response.iter_content(16), b"") except (RequestException, SSLError) as exc: msg = str(exc) continue if validate_sig: result = validate_sig(file_header) if result is not True: self.release_conn(response) self.log.warning( result or "Invalid file signature bytes") return False if validate_ext and self._adjust_extension( pathfmt, file_header) and pathfmt.exists(): pathfmt.temppath = "" response.close() return True # set open mode if not offset: mode = "w+b" if file_size: self.log.debug("Unable to resume partial download") else: mode = "r+b" self.log.debug("Resuming download at byte %d", offset) # download content self.downloading = True with pathfmt.open(mode) as fp: if fp is None: # '.part' file no longer exists break if file_header: fp.write(file_header) offset += len(file_header) elif offset: if adjust_extension and \ pathfmt.extension in SIGNATURE_CHECKS: self._adjust_extension(pathfmt, fp.read(16)) fp.seek(offset) self.out.start(pathfmt.path) try: self.receive(fp, content, size, offset) except (RequestException, SSLError) as exc: msg = str(exc) output.stderr_write("\n") continue except exception.StopExtraction: response.close() return False except exception.ControlException: response.close() raise # check file size if size and fp.tell() < size: msg = f"file size mismatch ({fp.tell()} < {size})" output.stderr_write("\n") continue break self.downloading = False if self.mtime: if "_http_lastmodified" in kwdict: kwdict["_mtime_http"] = kwdict["_http_lastmodified"] else: kwdict["_mtime_http"] = response.headers.get("Last-Modified") else: kwdict["_mtime_http"] = None return True def release_conn(self, response): """Release connection back to pool by consuming response body""" try: for _ in response.iter_content(self.chunk_size): pass except (RequestException, SSLError) as exc: output.stderr_write("\n") self.log.debug( "Unable to consume response body (%s: %s); " "closing the connection anyway", exc.__class__.__name__, exc) response.close() def receive(self, fp, content, bytes_total, bytes_start): write = fp.write for data in content: write(data) if FLAGS.DOWNLOAD is not None: FLAGS.process("DOWNLOAD") def _receive_rate(self, fp, content, bytes_total, bytes_start): rate = self.rate() if self.rate else None write = fp.write progress = self.progress bytes_downloaded = 0 time_start = time.monotonic() for data in content: time_elapsed = time.monotonic() - time_start bytes_downloaded += len(data) write(data) if FLAGS.DOWNLOAD is not None: FLAGS.process("DOWNLOAD") if progress is not None: if time_elapsed > progress: self.out.progress( bytes_total, bytes_start + bytes_downloaded, int(bytes_downloaded / time_elapsed), ) if rate is not None: time_expected = bytes_downloaded / rate if time_expected > time_elapsed: time.sleep(time_expected - time_elapsed) def _find_extension(self, response): """Get filename extension from MIME type""" mtype = response.headers.get("Content-Type", "image/jpeg") mtype = mtype.partition(";")[0] if "/" not in mtype: mtype = "image/" + mtype if mtype in MIME_TYPES: return MIME_TYPES[mtype] if ext := mimetypes.guess_extension(mtype, strict=False): return ext[1:] self.log.warning("Unknown MIME type '%s'", mtype) return "bin" def _adjust_extension(self, pathfmt, file_header): """Check filename extension against file header""" if not SIGNATURE_CHECKS[pathfmt.extension](file_header): for ext, check in SIGNATURE_CHECKS.items(): if check(file_header): pathfmt.set_extension(ext) pathfmt.build_path() return True return False MIME_TYPES = { "image/jpeg" : "jpg", "image/jpg" : "jpg", "image/png" : "png", "image/gif" : "gif", "image/bmp" : "bmp", "image/x-bmp" : "bmp", "image/x-ms-bmp": "bmp", "image/webp" : "webp", "image/avif" : "avif", "image/heic" : "heic", "image/heif" : "heif", "image/svg+xml" : "svg", "image/ico" : "ico", "image/icon" : "ico", "image/x-icon" : "ico", "image/vnd.microsoft.icon" : "ico", "image/x-photoshop" : "psd", "application/x-photoshop" : "psd", "image/vnd.adobe.photoshop": "psd", "video/webm": "webm", "video/ogg" : "ogg", "video/mp4" : "mp4", "video/m4v" : "m4v", "video/x-m4v": "m4v", "video/quicktime": "mov", "audio/wav" : "wav", "audio/x-wav": "wav", "audio/webm" : "webm", "audio/ogg" : "ogg", "audio/mpeg" : "mp3", "application/zip" : "zip", "application/x-zip": "zip", "application/x-zip-compressed": "zip", "application/rar" : "rar", "application/x-rar": "rar", "application/x-rar-compressed": "rar", "application/x-7z-compressed" : "7z", "application/pdf" : "pdf", "application/x-pdf": "pdf", "application/x-shockwave-flash": "swf", "text/html": "html", "application/ogg": "ogg", # https://www.iana.org/assignments/media-types/model/obj "model/obj": "obj", "application/octet-stream": "bin", } def _signature_html(s): s = s[:14].lstrip() return s and b"= 0 else float("inf"), "socket_timeout": self.config("timeout", extractor._timeout), "nocheckcertificate": not self.config("verify", extractor._verify), "proxy": self.proxies.get("http") if self.proxies else None, } self.ytdl_instance = None self.rate_dyn = None self.forward_cookies = self.config("forward-cookies", True) self.progress = self.config("progress", 3.0) self.outtmpl = self.config("outtmpl") def download(self, url, pathfmt): kwdict = pathfmt.kwdict ytdl_instance = kwdict.pop("_ytdl_instance", None) if not ytdl_instance: ytdl_instance = self.ytdl_instance if not ytdl_instance: try: module = ytdl.import_module(self.config("module")) except (ImportError, SyntaxError) as exc: self.log.error("Cannot import module '%s'", getattr(exc, "name", "")) self.log.debug("", exc_info=exc) self.download = lambda u, p: False return False try: ytdl_version = module.version.__version__ except Exception: ytdl_version = "" self.log.debug("Using %s version %s", module, ytdl_version) self.ytdl_instance = ytdl_instance = ytdl.construct_YoutubeDL( module, self, self.ytdl_opts) if self.outtmpl == "default": self.outtmpl = module.DEFAULT_OUTTMPL if self.forward_cookies: self.log.debug("Forwarding cookies to %s", ytdl_instance.__module__) set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in self.session.cookies: set_cookie(cookie) if "__gdl_initialize" in ytdl_instance.params: del ytdl_instance.params["__gdl_initialize"] if self.progress is not None: ytdl_instance.add_progress_hook(self._progress_hook) if rlf := ytdl_instance.params.pop("__gdl_ratelimit_func", False): self.rate_dyn = rlf info_dict = kwdict.pop("_ytdl_info_dict", None) if not info_dict: url = url[5:] try: if manifest := kwdict.pop("_ytdl_manifest", None): info_dict = self._extract_manifest( ytdl_instance, url, manifest, kwdict.pop("_ytdl_manifest_data", None), kwdict.pop("_ytdl_manifest_headers", None)) else: info_dict = self._extract_info(ytdl_instance, url) except Exception as exc: self.log.debug("", exc_info=exc) self.log.warning("%s: %s", exc.__class__.__name__, exc) if not info_dict: return False if "entries" in info_dict: index = kwdict.get("_ytdl_index") if index is None: return self._download_playlist( ytdl_instance, pathfmt, info_dict) else: info_dict = info_dict["entries"][index] if extra := kwdict.get("_ytdl_extra"): info_dict.update(extra) return self._download_video(ytdl_instance, pathfmt, info_dict) def _download_video(self, ytdl_instance, pathfmt, info_dict): if "url" in info_dict: text.nameext_from_url(info_dict["url"], pathfmt.kwdict) formats = info_dict.get("requested_formats") if formats and not compatible_formats(formats): info_dict["ext"] = "mkv" elif "ext" not in info_dict: try: info_dict["ext"] = info_dict["formats"][0]["ext"] except LookupError: info_dict["ext"] = "mp4" if self.outtmpl: self._set_outtmpl(ytdl_instance, self.outtmpl) pathfmt.filename = filename = \ ytdl_instance.prepare_filename(info_dict) pathfmt.extension = info_dict["ext"] pathfmt.path = pathfmt.directory + filename pathfmt.realpath = pathfmt.temppath = ( pathfmt.realdirectory + filename) else: pathfmt.set_extension(info_dict["ext"]) pathfmt.build_path() if pathfmt.exists(): pathfmt.temppath = "" return True if self.rate_dyn is not None: # static ratelimits are set in ytdl.construct_YoutubeDL ytdl_instance.params["ratelimit"] = self.rate_dyn() self.out.start(pathfmt.path) if self.part: pathfmt.kwdict["extension"] = pathfmt.prefix filename = pathfmt.build_filename(pathfmt.kwdict) pathfmt.kwdict["extension"] = info_dict["ext"] if self.partdir: path = os.path.join(self.partdir, filename) else: path = pathfmt.realdirectory + filename path = path.replace("%", "%%") + "%(ext)s" else: path = pathfmt.realpath.replace("%", "%%") self._set_outtmpl(ytdl_instance, path) try: ytdl_instance.process_info(info_dict) except Exception as exc: self.log.debug("", exc_info=exc) return False pathfmt.temppath = info_dict.get("filepath") or info_dict["_filename"] return True def _download_playlist(self, ytdl_instance, pathfmt, info_dict): pathfmt.set_extension("%(playlist_index)s.%(ext)s") pathfmt.build_path() self._set_outtmpl(ytdl_instance, pathfmt.realpath) for entry in info_dict["entries"]: if self.rate_dyn is not None: ytdl_instance.params["ratelimit"] = self.rate_dyn() ytdl_instance.process_info(entry) return True def _extract_info(self, ytdl, url): return ytdl.extract_info(url, download=False) def _extract_manifest(self, ytdl, url, manifest_type, manifest_data=None, headers=None): extr = ytdl.get_info_extractor("Generic") video_id = extr._generic_id(url) if manifest_type == "hls": if manifest_data is None: try: fmts, subs = extr._extract_m3u8_formats_and_subtitles( url, video_id, "mp4", headers=headers) except AttributeError: fmts = extr._extract_m3u8_formats( url, video_id, "mp4", headers=headers) subs = None else: try: fmts, subs = extr._parse_m3u8_formats_and_subtitles( url, video_id, "mp4") except AttributeError: fmts = extr._parse_m3u8_formats(url, video_id, "mp4") subs = None elif manifest_type == "dash": if manifest_data is None: try: fmts, subs = extr._extract_mpd_formats_and_subtitles( url, video_id, headers=headers) except AttributeError: fmts = extr._extract_mpd_formats( url, video_id, headers=headers) subs = None else: if isinstance(manifest_data, str): manifest_data = ElementTree.fromstring(manifest_data) try: fmts, subs = extr._parse_mpd_formats_and_subtitles( manifest_data, mpd_id="dash") except AttributeError: fmts = extr._parse_mpd_formats( manifest_data, mpd_id="dash") subs = None else: self.log.error("Unsupported manifest type '%s'", manifest_type) return None info_dict = { "extractor": "", "id" : video_id, "title" : video_id, "formats" : fmts, "subtitles": subs, } return ytdl.process_ie_result(info_dict, download=False) def _progress_hook(self, info): if info["status"] == "downloading" and \ info["elapsed"] >= self.progress: total = info.get("total_bytes") or info.get("total_bytes_estimate") speed = info.get("speed") self.out.progress( None if total is None else int(total), info["downloaded_bytes"], int(speed) if speed else 0, ) def _set_outtmpl(self, ytdl_instance, outtmpl): try: ytdl_instance._parse_outtmpl except AttributeError: try: ytdl_instance.outtmpl_dict["default"] = outtmpl except AttributeError: ytdl_instance.params["outtmpl"] = outtmpl else: ytdl_instance.params["outtmpl"] = {"default": outtmpl} def compatible_formats(formats): """Returns True if 'formats' are compatible for merge""" video_ext = formats[0].get("ext") audio_ext = formats[1].get("ext") if video_ext == "webm" and audio_ext == "webm": return True exts = ("mp3", "mp4", "m4a", "m4p", "m4b", "m4r", "m4v", "ismv", "isma") return video_ext in exts and audio_ext in exts __downloader__ = YoutubeDLDownloader ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/exception.py0000644000175000017500000001203415040344700017072 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Exception classes used by gallery-dl Class Hierarchy: Exception └── GalleryDLException ├── ExtractionError │ ├── HttpError │ │ └── ChallengeError │ ├── AuthorizationError │ │ └── AuthRequired │ ├── AuthenticationError │ └── NotFoundError ├── InputError │ ├── FormatError │ │ ├── FilenameFormatError │ │ └── DirectoryFormatError │ ├── FilterError │ ├── InputFileError │ └── NoExtractorError └── ControlException ├── StopExtraction ├── AbortExtraction ├── TerminateExtraction └── RestartExtraction """ class GalleryDLException(Exception): """Base class for GalleryDL exceptions""" default = None msgfmt = None code = 1 def __init__(self, message=None, fmt=True): if not message: message = self.default elif isinstance(message, Exception): message = f"{message.__class__.__name__}: {message}" if fmt and self.msgfmt is not None: message = self.msgfmt.replace("{}", message) self.message = message Exception.__init__(self, message) ############################################################################### # Extractor Errors ############################################################ class ExtractionError(GalleryDLException): """Base class for exceptions during information extraction""" code = 4 class HttpError(ExtractionError): """HTTP request during data extraction failed""" default = "HTTP request failed" def __init__(self, message="", response=None): self.response = response if response is None: self.status = 0 else: self.status = response.status_code if not message: message = (f"'{response.status_code} {response.reason}' " f"for '{response.url}'") ExtractionError.__init__(self, message) class ChallengeError(HttpError): code = 8 def __init__(self, challenge, response): message = ( f"{challenge} ({response.status_code} {response.reason}) " f"for '{response.url}'") HttpError.__init__(self, message, response) class AuthenticationError(ExtractionError): """Invalid or missing login credentials""" default = "Invalid login credentials" code = 16 class AuthorizationError(ExtractionError): """Insufficient privileges to access a resource""" default = "Insufficient privileges to access this resource" code = 16 class AuthRequired(AuthorizationError): default = "Account credentials required" def __init__(self, required=None, message=None): if required and not message: if isinstance(required, str): message = f"{required} required" else: message = f"{' or '.join(required)} required" AuthorizationError.__init__(self, message) class NotFoundError(ExtractionError): """Requested resource (gallery/image) could not be found""" msgfmt = "Requested {} could not be found" default = "resource (gallery/image)" ############################################################################### # User Input ################################################################## class InputError(GalleryDLException): """Error caused by user input and config options""" code = 32 class FormatError(InputError): """Error while building output paths""" class FilenameFormatError(FormatError): """Error while building output filenames""" msgfmt = "Applying filename format string failed ({})" class DirectoryFormatError(FormatError): """Error while building output directory paths""" msgfmt = "Applying directory format string failed ({})" class FilterError(InputError): """Error while evaluating a filter expression""" msgfmt = "Evaluating filter expression failed ({})" class InputFileError(InputError): """Error when parsing an input file""" class NoExtractorError(InputError): """No extractor can handle the given URL""" ############################################################################### # Control Flow ################################################################ class ControlException(GalleryDLException): code = 0 class StopExtraction(ControlException): """Stop data extraction""" class AbortExtraction(ExtractionError, ControlException): """Abort data extraction due to an error""" class TerminateExtraction(ControlException): """Terminate data extraction""" class RestartExtraction(ControlException): """Restart data extraction""" ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1753638554.1103325 gallery_dl-1.30.2/gallery_dl/extractor/0000755000175000017500000000000015041463232016540 5ustar00mikemike././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/2ch.py0000644000175000017500000000562515040344700017573 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://2ch.hk/""" from .common import Extractor, Message from .. import text, util class _2chThreadExtractor(Extractor): """Extractor for 2ch threads""" category = "2ch" subcategory = "thread" root = "https://2ch.hk" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim}{filename:? //}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?2ch\.hk/([^/?#]+)/res/(\d+)" example = "https://2ch.hk/a/res/12345.html" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = f"{self.root}/{self.board}/res/{self.thread}.json" posts = self.request_json(url)["threads"][0]["posts"] op = posts[0] title = op.get("subject") or text.remove_html(op["comment"]) thread = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, thread for post in posts: if files := post.get("files"): post["post_name"] = post["name"] post["date"] = text.parse_timestamp(post["timestamp"]) del post["files"] del post["name"] for file in files: file.update(thread) file.update(post) file["filename"] = file["fullname"].rpartition(".")[0] file["tim"], _, file["extension"] = \ file["name"].rpartition(".") yield Message.Url, self.root + file["path"], file class _2chBoardExtractor(Extractor): """Extractor for 2ch boards""" category = "2ch" subcategory = "board" root = "https://2ch.hk" pattern = r"(?:https?://)?2ch\.hk/([^/?#]+)/?$" example = "https://2ch.hk/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match[1] def items(self): base = f"{self.root}/{self.board}" # index page url = f"{base}/index.json" index = self.request_json(url) index["_extractor"] = _2chThreadExtractor for thread in index["threads"]: url = f"{base}/res/{thread['thread_num']}.html" yield Message.Queue, url, index # pages 1..n for n in util.advance(index["pages"], 1): url = f"{base}/{n}.json" page = self.request_json(url) page["_extractor"] = _2chThreadExtractor for thread in page["threads"]: url = f"{base}/res/{thread['thread_num']}.html" yield Message.Queue, url, page ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/2chan.py0000644000175000017500000000620315040344700020103 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.2chan.net/""" from .common import Extractor, Message from .. import text class _2chanThreadExtractor(Extractor): """Extractor for 2chan threads""" category = "2chan" subcategory = "thread" directory_fmt = ("{category}", "{board_name}", "{thread}") filename_fmt = "{tim}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?([\w-]+)\.2chan\.net/([^/?#]+)/res/(\d+)" example = "https://dec.2chan.net/12/res/12345.htm" def __init__(self, match): Extractor.__init__(self, match) self.server, self.board, self.thread = match.groups() def items(self): url = (f"https://{self.server}.2chan.net" f"/{self.board}/res/{self.thread}.htm") page = self.request(url).text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): if "filename" not in post: continue post.update(data) url = (f"https://{post['server']}.2chan.net" f"/{post['board']}/src/{post['filename']}") yield Message.Url, url, post def metadata(self, page): """Collect metadata for extractor-job""" title, _, boardname = text.extr( page, "", "").rpartition(" - ") return { "server": self.server, "title": title, "board": self.board, "board_name": boardname[:-4], "thread": self.thread, } def posts(self, page): """Build a list of all post-objects""" page = text.extr( page, '
') return [ self.parse(post) for post in page.split('') ] def parse(self, post): """Build post-object by extracting data from an HTML post""" data = self._extract_post(post) if data["name"]: data["name"] = data["name"].strip() path = text.extr(post, '' , '<'), ("name", 'class="cnm">' , '<'), ("now" , 'class="cnw">' , '<'), ("no" , 'class="cno">No.', '<'), (None , '', ''), ))[0] def _extract_image(self, post, data): text.extract_all(post, ( (None , '_blank', ''), ("filename", '>', '<'), ("fsize" , '(', ' '), ), 0, data) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/2chen.py0000644000175000017500000000635315040344700020115 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://sturdychan.help/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:sturdychan.help|2chen\.(?:moe|club))" class _2chenThreadExtractor(Extractor): """Extractor for 2chen threads""" category = "2chen" subcategory = "thread" root = "https://sturdychan.help" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time} {filename}.{extension}" archive_fmt = "{board}_{thread}_{hash}_{time}" pattern = BASE_PATTERN + r"/([^/?#]+)/(\d+)" example = "https://sturdychan.help/a/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = f"{self.root}/{self.board}/{self.thread}" page = self.request(url, encoding="utf-8", notfound="thread").text data = self.metadata(page) yield Message.Directory, data for post in self.posts(page): url = post["url"] if not url: continue if url[0] == "/": url = self.root + url post["url"] = url = url.partition("?")[0] post.update(data) post["time"] = text.parse_int(post["date"].timestamp()) yield Message.Url, url, text.nameext_from_url( post["filename"], post) def metadata(self, page): board, pos = text.extract(page, 'class="board">/', '/<') title = text.extract(page, "

", "

", pos)[0] return { "board" : board, "thread": self.thread, "title" : text.unescape(title), } def posts(self, page): """Return iterable with relevant posts""" return map(self.parse, text.extract_iter( page, 'class="glass media', '')) def parse(self, post): extr = text.extract_from(post) return { "name" : text.unescape(extr("", "")), "date" : text.parse_datetime( extr("")[2], "%d %b %Y (%a) %H:%M:%S" ), "no" : extr('href="#p', '"'), "url" : extr('
[^&#]+)") example = "http://behoimi.org/post?tags=TAG" def posts(self): params = {"tags": self.tags} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPoolExtractor(_3dbooruBase, moebooru.MoebooruPoolExtractor): """Extractor for image-pools from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/pool/show/(?P\d+)" example = "http://behoimi.org/pool/show/12345" def posts(self): params = {"tags": "pool:" + self.pool_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPostExtractor(_3dbooruBase, moebooru.MoebooruPostExtractor): """Extractor for single images from behoimi.org""" pattern = r"(?:https?://)?(?:www\.)?behoimi\.org/post/show/(?P\d+)" example = "http://behoimi.org/post/show/12345" def posts(self): params = {"tags": "id:" + self.post_id} return self._pagination(self.root + "/post/index.json", params) class _3dbooruPopularExtractor( _3dbooruBase, moebooru.MoebooruPopularExtractor): """Extractor for popular images from behoimi.org""" pattern = (r"(?:https?://)?(?:www\.)?behoimi\.org" r"/post/popular_(?Pby_(?:day|week|month)|recent)" r"(?:\?(?P[^#]*))?") example = "http://behoimi.org/post/popular_by_month" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/4archive.py0000644000175000017500000000734415040344700020624 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://4archive.org/""" from .common import Extractor, Message from .. import text, util class _4archiveThreadExtractor(Extractor): """Extractor for 4archive threads""" category = "4archive" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{no} {filename}.{extension}" archive_fmt = "{board}_{thread}_{no}" root = "https://4archive.org" referer = False pattern = r"(?:https?://)?4archive\.org/board/([^/?#]+)/thread/(\d+)" example = "https://4archive.org/board/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = f"{self.root}/board/{self.board}/thread/{self.thread}" page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = posts[0]["com"][:50] for post in posts: post.update(data) post["time"] = int(util.datetime_to_timestamp(post["date"])) yield Message.Directory, post if "url" in post: yield Message.Url, post["url"], text.nameext_from_url( post["filename"], post) def metadata(self, page): return { "board" : self.board, "thread": text.parse_int(self.thread), "title" : text.unescape(text.extr( page, 'class="subject">', "")) } def posts(self, page): return [ self.parse(post) for post in page.split('class="postContainer')[1:] ] def parse(self, post): extr = text.extract_from(post) data = { "name": extr('class="name">', ""), "date": text.parse_datetime( extr('class="dateTime postNum" >', "<").strip(), "%Y-%m-%d %H:%M:%S"), "no" : text.parse_int(extr(">Post No.", "<")), } if 'class="file"' in post: extr('class="fileText"', ">File: ").strip()[1:], "size" : text.parse_bytes(extr(" (", ", ")[:-1]), "width" : text.parse_int(extr("", "x")), "height" : text.parse_int(extr("", "px")), }) extr("
", "
"))) return data class _4archiveBoardExtractor(Extractor): """Extractor for 4archive boards""" category = "4archive" subcategory = "board" root = "https://4archive.org" pattern = r"(?:https?://)?4archive\.org/board/([^/?#]+)(?:/(\d+))?/?$" example = "https://4archive.org/board/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match[1] self.num = text.parse_int(match[2], 1) def items(self): data = {"_extractor": _4archiveThreadExtractor} while True: url = f"{self.root}/board/{self.board}/{self.num}" page = self.request(url).text if 'class="thread"' not in page: return for thread in text.extract_iter(page, 'class="thread" id="t', '"'): url = f"{self.root}/board/{self.board}/thread/{thread}" yield Message.Queue, url, data self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753381746.0 gallery_dl-1.30.2/gallery_dl/extractor/4chan.py0000644000175000017500000000546515040475562020131 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.4chan.org/""" from .common import Extractor, Message from .. import text class _4chanThreadExtractor(Extractor): """Extractor for 4chan threads""" category = "4chan" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{tim} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = (r"(?:https?://)?boards\.4chan(?:nel)?\.org" r"/([^/]+)/thread/(\d+)") example = "https://boards.4channel.org/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = f"https://a.4cdn.org/{self.board}/thread/{self.thread}.json" posts = self.request_json(url)["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) data = { "board" : self.board, "thread": self.thread, "title" : text.unescape(title)[:50], } yield Message.Directory, data for post in posts: if "filename" in post: post.update(data) post["extension"] = post["ext"][1:] post["filename"] = text.unescape(post["filename"]) post["_http_signature"] = _detect_null_byte url = (f"https://i.4cdn.org" f"/{post['board']}/{post['tim']}{post['ext']}") yield Message.Url, url, post def _detect_null_byte(signature): """Return False if all file signature bytes are null""" if signature: if signature[0]: return True for byte in signature: if byte: return True return "File data consists of null bytes" class _4chanBoardExtractor(Extractor): """Extractor for 4chan boards""" category = "4chan" subcategory = "board" pattern = r"(?:https?://)?boards\.4chan(?:nel)?\.org/([^/?#]+)/\d*$" example = "https://boards.4channel.org/a/" def __init__(self, match): Extractor.__init__(self, match) self.board = match[1] def items(self): url = f"https://a.4cdn.org/{self.board}/threads.json" threads = self.request_json(url) for page in threads: for thread in page["threads"]: url = (f"https://boards.4chan.org" f"/{self.board}/thread/{thread['no']}/") thread["page"] = page["page"] thread["_extractor"] = _4chanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/4chanarchives.py0000644000175000017500000000761515040344700021642 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://4chanarchives.com/""" from .common import Extractor, Message from .. import text class _4chanarchivesThreadExtractor(Extractor): """Extractor for threads on 4chanarchives.com""" category = "4chanarchives" subcategory = "thread" root = "https://4chanarchives.com" directory_fmt = ("{category}", "{board}", "{thread} - {title}") filename_fmt = "{no}-{filename}.{extension}" archive_fmt = "{board}_{thread}_{no}" referer = False pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)/thread/(\d+)" example = "https://4chanarchives.com/board/a/thread/12345/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = f"{self.root}/board/{self.board}/thread/{self.thread}" page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = text.unescape(text.remove_html( posts[0]["com"]))[:50] for post in posts: post.update(data) yield Message.Directory, post if "url" in post: yield Message.Url, post["url"], post def metadata(self, page): return { "board" : self.board, "thread" : self.thread, "title" : text.unescape(text.extr( page, 'property="og:title" content="', '"')), } def posts(self, page): """Build a list of all post objects""" return [self.parse(html) for html in text.extract_iter( page, 'id="pc', '')] def parse(self, html): """Build post object by extracting data from an HTML post""" post = self._extract_post(html) if ">File: <" in html: self._extract_file(html, post) post["extension"] = post["url"].rpartition(".")[2] return post def _extract_post(self, html): extr = text.extract_from(html) return { "no" : text.parse_int(extr('', '"')), "name": extr('class="name">', '<'), "time": extr('class="dateTime postNum" >', '<').rstrip(), "com" : text.unescape( html[html.find('")[2]), } def _extract_file(self, html, post): extr = text.extract_from(html, html.index(">File: <")) post["url"] = extr('href="', '"') post["filename"] = text.unquote(extr(">", "<").rpartition(".")[0]) post["fsize"] = extr("(", ", ") post["w"] = text.parse_int(extr("", "x")) post["h"] = text.parse_int(extr("", ")")) class _4chanarchivesBoardExtractor(Extractor): """Extractor for boards on 4chanarchives.com""" category = "4chanarchives" subcategory = "board" root = "https://4chanarchives.com" pattern = r"(?:https?://)?4chanarchives\.com/board/([^/?#]+)(?:/(\d+))?/?$" example = "https://4chanarchives.com/board/a/" def __init__(self, match): Extractor.__init__(self, match) self.board, self.page = match.groups() def items(self): data = {"_extractor": _4chanarchivesThreadExtractor} pnum = text.parse_int(self.page, 1) needle = '''
data["pageCount"]: return url = f"{self.root}/{board}/{pnum}.json" threads = self.request_json(url)["threads"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/8muses.py0000644000175000017500000000714215040344700020337 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comics.8muses.com/""" from .common import Extractor, Message from .. import text, util class _8musesAlbumExtractor(Extractor): """Extractor for image albums on comics.8muses.com""" category = "8muses" subcategory = "album" directory_fmt = ("{category}", "{album[path]}") filename_fmt = "{page:>03}.{extension}" archive_fmt = "{hash}" root = "https://comics.8muses.com" pattern = (r"(?:https?://)?(?:comics\.|www\.)?8muses\.com" r"(/comics/album/[^?#]+)(\?[^#]+)?") example = "https://comics.8muses.com/comics/album/PATH/TITLE" def __init__(self, match): Extractor.__init__(self, match) self.path = match[1] self.params = match[2] or "" def items(self): url = self.root + self.path + self.params while True: data = self._unobfuscate(text.extr( self.request(url).text, 'id="ractive-public" type="text/plain">', '')) if images := data.get("pictures"): count = len(images) album = self._make_album(data["album"]) yield Message.Directory, {"album": album, "count": count} for num, image in enumerate(images, 1): url = self.root + "/image/fl/" + image["publicUri"] img = { "url" : url, "page" : num, "hash" : image["publicUri"], "count" : count, "album" : album, "extension": "jpg", } yield Message.Url, url, img if albums := data.get("albums"): for album in albums: permalink = album.get("permalink") if not permalink: self.log.debug("Private album") continue url = self.root + "/comics/album/" + permalink yield Message.Queue, url, { "url" : url, "name" : album["name"], "private" : album["isPrivate"], "_extractor": _8musesAlbumExtractor, } if data["page"] >= data["pages"]: return path, _, num = self.path.rstrip("/").rpartition("/") path = path if num.isdecimal() else self.path url = f"{self.root}{path}/{data['page'] + 1}{self.params}" def _make_album(self, album): return { "id" : album["id"], "path" : album["path"], "parts" : album["path"].split("/"), "title" : album["name"], "private": album["isPrivate"], "url" : self.root + "/comics/album/" + album["permalink"], "parent" : text.parse_int(album["parentId"]), "views" : text.parse_int(album["numberViews"]), "likes" : text.parse_int(album["numberLikes"]), "date" : text.parse_datetime( album["updatedAt"], "%Y-%m-%dT%H:%M:%S.%fZ"), } def _unobfuscate(self, data): return util.json_loads("".join([ chr(33 + (ord(c) + 14) % 94) if "!" <= c <= "~" else c for c in text.unescape(data.strip("\t\n\r !")) ])) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753381746.0 gallery_dl-1.30.2/gallery_dl/extractor/__init__.py0000644000175000017500000001302315040475562020660 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. import sys from ..text import re_compile modules = [ "2ch", "2chan", "2chen", "35photo", "3dbooru", "4chan", "4archive", "4chanarchives", "500px", "8chan", "8muses", "adultempire", "agnph", "ao3", "arcalive", "architizer", "artstation", "aryion", "batoto", "bbc", "behance", "bilibili", "blogger", "bluesky", "boosty", "bunkr", "catbox", "chevereto", "cien", "civitai", "comick", "comicvine", "cyberdrop", "danbooru", "dankefuerslesen", "desktopography", "deviantart", "discord", "dynastyscans", "e621", "erome", "everia", "exhentai", "facebook", "fanbox", "fantia", "fapello", "fapachi", "flickr", "furaffinity", "furry34", "fuskator", "gelbooru", "gelbooru_v01", "gelbooru_v02", "girlsreleased", "girlswithmuscle", "gofile", "hatenablog", "hentai2read", "hentaicosplays", "hentaifoundry", "hentaihand", "hentaihere", "hentainexus", "hiperdex", "hitomi", "hotleak", "idolcomplex", "imagebam", "imagechest", "imagefap", "imgbb", "imgbox", "imgth", "imgur", "imhentai", "inkbunny", "instagram", "issuu", "itaku", "itchio", "iwara", "jschan", "kabeuchi", "keenspot", "kemono", "khinsider", "komikcast", "leakgallery", "lensdump", "lexica", "lightroom", "livedoor", "lofter", "luscious", "lynxchan", "madokami", "mangadex", "mangafox", "mangahere", "manganelo", "mangapark", "mangaread", "mangoxo", "misskey", "motherless", "myhentaigallery", "myportfolio", "naverblog", "naverchzzk", "naverwebtoon", "nekohouse", "newgrounds", "nhentai", "nijie", "nitter", "nozomi", "nsfwalbum", "nudostar", "paheal", "patreon", "pexels", "philomena", "photovogue", "picarto", "pictoa", "piczel", "pillowfort", "pinterest", "pixeldrain", "pixiv", "pixnet", "plurk", "poipiku", "poringa", "pornhub", "pornpics", "postmill", "rawkuma", "reactor", "readcomiconline", "realbooru", "redbust", "reddit", "redgifs", "rule34us", "rule34vault", "rule34xyz", "saint", "sankaku", "sankakucomplex", "schalenetwork", "scrolller", "seiga", "senmanga", "sexcom", "shimmie2", "simplyhentai", "skeb", "slickpic", "slideshare", "smugmug", "soundgasm", "speakerdeck", "steamgriddb", "subscribestar", "szurubooru", "tapas", "tcbscans", "telegraph", "tenor", "tiktok", "tmohentai", "toyhouse", "tsumino", "tumblr", "tumblrgallery", "twibooru", "twitter", "urlgalleries", "unsplash", "uploadir", "urlshortener", "vanillarock", "vichan", "vipergirls", "vk", "vsco", "wallhaven", "wallpapercave", "warosu", "weasyl", "webmshare", "webtoons", "weebcentral", "weibo", "wikiart", "wikifeet", "wikimedia", "xfolio", "xhamster", "xvideos", "yiffverse", "zerochan", "zzup", "booru", "moebooru", "foolfuuka", "foolslide", "mastodon", "shopify", "lolisafe", "imagehosts", "directlink", "recursive", "oauth", "noop", "ytdl", "generic", ] def find(url): """Find a suitable extractor for the given URL""" for cls in _list_classes(): if match := cls.pattern.match(url): return cls(match) return None def add(cls): """Add 'cls' to the list of available extractors""" if isinstance(cls.pattern, str): cls.pattern = re_compile(cls.pattern) _cache.append(cls) return cls def add_module(module): """Add all extractors in 'module' to the list of available extractors""" if classes := _get_classes(module): if isinstance(classes[0].pattern, str): for cls in classes: cls.pattern = re_compile(cls.pattern) _cache.extend(classes) return classes def extractors(): """Yield all available extractor classes""" return sorted( _list_classes(), key=lambda x: x.__name__ ) # -------------------------------------------------------------------- # internals def _list_classes(): """Yield available extractor classes""" yield from _cache for module in _module_iter: yield from add_module(module) globals()["_list_classes"] = lambda : _cache def _modules_internal(): globals_ = globals() for module_name in modules: yield __import__(module_name, globals_, None, (), 1) def _modules_path(path, files): sys.path.insert(0, path) try: return [ __import__(name[:-3]) for name in files if name.endswith(".py") ] finally: del sys.path[0] def _get_classes(module): """Return a list of all extractor classes in a module""" return [ cls for cls in module.__dict__.values() if ( hasattr(cls, "pattern") and cls.__module__ == module.__name__ ) ] _cache = [] _module_iter = _modules_internal() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/adultempire.py0000644000175000017500000000354415040344700021430 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.adultempire.com/""" from .common import GalleryExtractor from .. import text class AdultempireGalleryExtractor(GalleryExtractor): """Extractor for image galleries from www.adultempire.com""" category = "adultempire" root = "https://www.adultempire.com" pattern = (r"(?:https?://)?(?:www\.)?adult(?:dvd)?empire\.com" r"(/(\d+)/gallery\.html)") example = "https://www.adultempire.com/12345/gallery.html" def __init__(self, match): GalleryExtractor.__init__(self, match) self.gallery_id = match[2] def _init(self): self.cookies.set("ageConfirmed", "true", domain="www.adultempire.com") def metadata(self, page): extr = text.extract_from(page, page.index('
')) return { "gallery_id": text.parse_int(self.gallery_id), "title" : text.unescape(extr('title="', '"')), "studio" : extr(">studio", "<").strip(), "date" : text.parse_datetime(extr( ">released", "<").strip(), "%m/%d/%Y"), "actors" : sorted(text.split_html(extr( '
    = int(attrib["count"]): return params["page"] += 1 def _html(self, post): url = f"{self.root}/gallery/post/show/{post['id']}/" return self.request(url).text def _tags(self, post, page): tag_container = text.extr( page, '
      ', '

      Statistics

      ') if not tag_container: return tags = collections.defaultdict(list) pattern = util.re(r'class="(.)typetag">([^<]+)') for tag_type, tag_name in pattern.findall(tag_container): tags[tag_type].append(text.unquote(tag_name).replace(" ", "_")) for key, value in tags.items(): post["tags_" + self.TAG_TYPES[key]] = " ".join(value) class AgnphTagExtractor(AgnphExtractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/gallery/post/(?:\?([^#]+))?$" example = "https://agn.ph/gallery/post/?search=TAG" def __init__(self, match): AgnphExtractor.__init__(self, match) self.params = text.parse_query(self.groups[0]) def metadata(self): return {"search_tags": self.params.get("search") or ""} def posts(self): url = self.root + "/gallery/post/" return self._pagination(url, self.params.copy()) class AgnphPostExtractor(AgnphExtractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/gallery/post/show/(\d+)" example = "https://agn.ph/gallery/post/show/12345/" def posts(self): url = f"{self.root}/gallery/post/show/{self.groups[0]}/?api=xml" post = self.request_xml(url) return (self._xml_to_dict(post),) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/ao3.py0000644000175000017500000002670115040344700017577 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://archiveofourown.org/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import cache BASE_PATTERN = (r"(?:https?://)?(?:www\.)?" r"a(?:rchiveofourown|o3)\.(?:org|com|net)") class Ao3Extractor(Extractor): """Base class for ao3 extractors""" category = "ao3" root = "https://archiveofourown.org" categorytransfer = True cookies_domain = ".archiveofourown.org" cookies_names = ("remember_user_token",) request_interval = (0.5, 1.5) def items(self): self.login() base = self.root + "/works/" data = {"_extractor": Ao3WorkExtractor, "type": "work"} for work_id in self.works(): yield Message.Queue, base + work_id, data def items_list(self, type, needle, part=True): self.login() base = self.root + "/" data_work = {"_extractor": Ao3WorkExtractor, "type": "work"} data_series = {"_extractor": Ao3SeriesExtractor, "type": "series"} data_user = {"_extractor": Ao3UserExtractor, "type": "user"} for item in self._pagination(self.groups[0], needle): path = item.rpartition("/")[0] if part else item url = base + path if item.startswith("works/"): yield Message.Queue, url, data_work elif item.startswith("series/"): yield Message.Queue, url, data_series elif item.startswith("users/"): yield Message.Queue, url, data_user else: self.log.warning("Unsupported %s type '%s'", type, path) def works(self): return self._pagination(self.groups[0]) def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/users/login" page = self.request(url).text pos = page.find('id="loginform"') token = text.extract( page, ' name="authenticity_token" value="', '"', pos)[0] if not token: self.log.error("Unable to extract 'authenticity_token'") data = { "authenticity_token": text.unescape(token), "user[login]" : username, "user[password]" : password, "user[remember_me]" : "1", "commit" : "Log In", } response = self.request(url, method="POST", data=data) if not response.history: raise exception.AuthenticationError() remember = response.history[0].cookies.get("remember_user_token") if not remember: raise exception.AuthenticationError() return { "remember_user_token": remember, "user_credentials" : "1", } def _pagination(self, path, needle='
    ") for dl in text.extract_iter(download, ' href="', "') fmts[type.lower()] = path data = { "id" : text.parse_int(work_id), "rating" : text.split_html( extr('
    ', "
    ")), "warnings" : text.split_html( extr('
    ', "
    ")), "categories" : text.split_html( extr('
    ', "
    ")), "fandom" : text.split_html( extr('
    ', "
    ")), "relationships": text.split_html( extr('
    ', "
    ")), "characters" : text.split_html( extr('
    ', "
    ")), "tags" : text.split_html( extr('
    ', "
    ")), "lang" : extr('
    ', "
    "), "date" : text.parse_datetime( extr('
    ', "<"), "%Y-%m-%d"), "date_completed": text.parse_datetime( extr('>Completed:
    ', "<"), "%Y-%m-%d"), "date_updated" : text.parse_timestamp( path.rpartition("updated_at=")[2]), "words" : text.parse_int( extr('
    ', "<").replace(",", "")), "chapters" : chapters, "comments" : text.parse_int( extr('
    ', "<").replace(",", "")), "likes" : text.parse_int( extr('
    ', "<").replace(",", "")), "bookmarks" : text.parse_int(text.remove_html( extr('
    ', "
    ")).replace(",", "")), "views" : text.parse_int( extr('
    ', "<").replace(",", "")), "title" : text.unescape(text.remove_html( extr(' class="title heading">', "")).strip()), "author" : text.unescape(text.remove_html( extr(' class="byline heading">', ""))), "summary" : text.split_html( extr(' class="heading">Summary:', "
")), } data["language"] = util.code_to_language(data["lang"]) if series := data["series"]: extr = text.extract_from(series) data["series"] = { "prev" : extr(' class="previous" href="/works/', '"'), "index": extr(' class="position">Part ', " "), "id" : extr(' href="/series/', '"'), "name" : text.unescape(extr(">", "<")), "next" : extr(' class="next" href="/works/', '"'), } else: data["series"] = None yield Message.Directory, data for fmt in self.formats: try: url = text.urljoin(self.root, fmts[fmt]) except KeyError: self.log.warning("%s: Format '%s' not available", work_id, fmt) else: yield Message.Url, url, text.nameext_from_url(url, data) class Ao3SeriesExtractor(Ao3Extractor): """Extractor for AO3 works of a series""" subcategory = "series" pattern = BASE_PATTERN + r"(/series/(\d+))" example = "https://archiveofourown.org/series/12345" class Ao3TagExtractor(Ao3Extractor): """Extractor for AO3 works by tag""" subcategory = "tag" pattern = BASE_PATTERN + r"(/tags/([^/?#]+)/works(?:/?\?.+)?)" example = "https://archiveofourown.org/tags/TAG/works" class Ao3SearchExtractor(Ao3Extractor): """Extractor for AO3 search results""" subcategory = "search" pattern = BASE_PATTERN + r"(/works/search/?\?.+)" example = "https://archiveofourown.org/works/search?work_search[query]=air" class Ao3UserExtractor(Dispatch, Ao3Extractor): """Extractor for an AO3 user profile""" pattern = (BASE_PATTERN + r"/users/([^/?#]+(?:/pseuds/[^/?#]+)?)" r"(?:/profile)?/?(?:$|\?|#)") example = "https://archiveofourown.org/users/USER" def items(self): base = f"{self.root}/users/{self.groups[0]}/" return self._dispatch_extractors(( (Ao3UserWorksExtractor , base + "works"), (Ao3UserSeriesExtractor , base + "series"), (Ao3UserBookmarkExtractor, base + "bookmarks"), ), ("user-works", "user-series")) class Ao3UserWorksExtractor(Ao3Extractor): """Extractor for works of an AO3 user""" subcategory = "user-works" pattern = (BASE_PATTERN + r"(/users/([^/?#]+)/(?:pseuds/([^/?#]+)/)?" r"works(?:/?\?.+)?)") example = "https://archiveofourown.org/users/USER/works" class Ao3UserSeriesExtractor(Ao3Extractor): """Extractor for series of an AO3 user""" subcategory = "user-series" pattern = (BASE_PATTERN + r"(/users/([^/?#]+)/(?:pseuds/([^/?#]+)/)?" r"series(?:/?\?.+)?)") example = "https://archiveofourown.org/users/USER/series" def items(self): self.login() base = self.root + "/series/" data = {"_extractor": Ao3SeriesExtractor} for series_id in self.series(): yield Message.Queue, base + series_id, data def series(self): return self._pagination(self.groups[0], '
  • \n]+)").findall( post["content"]): if not self.emoticons and 'class="arca-emoticon"' in media: continue src = (text.extr(media, 'data-originalurl="', '"') or text.extr(media, 'src="', '"')) if not src: continue src, _, query = text.unescape(src).partition("?") if src[0] == "/": if src[1] == "/": url = "https:" + src.replace( "//ac-p.namu", "//ac-o.namu", 1) else: url = self.root + src else: url = src fallback = () query = f"?type=orig&{query}" if orig := text.extr(media, 'data-orig="', '"'): path, _, ext = url.rpartition(".") if ext != orig: fallback = (url + query,) url = path + "." + orig elif video and self.gifs: url_gif = url.rpartition(".")[0] + ".gif" if self.gifs_fallback: fallback = (url + query,) url = url_gif else: response = self.request( url_gif + query, method="HEAD", fatal=False) if response.status_code < 400: fallback = (url + query,) url = url_gif files.append({ "url" : url + query, "width" : text.parse_int(text.extr(media, 'width="', '"')), "height": text.parse_int(text.extr(media, 'height="', '"')), "_fallback": fallback, }) return files class ArcaliveBoardExtractor(ArcaliveExtractor): """Extractor for an arca.live board's posts""" subcategory = "board" pattern = BASE_PATTERN + r"/b/([^/?#]+)/?(?:\?([^#]+))?$" example = "https://arca.live/b/breaking" def articles(self): self.board, query = self.groups params = text.parse_query(query) return self.api.board(self.board, params) class ArcaliveUserExtractor(ArcaliveExtractor): """Extractor for an arca.live users's posts""" subcategory = "user" pattern = BASE_PATTERN + r"/u/@([^/?#]+)/?(?:\?([^#]+))?$" example = "https://arca.live/u/@USER" def articles(self): self.board = None user, query = self.groups params = text.parse_query(query) return self.api.user_posts(text.unquote(user), params) class ArcaliveAPI(): def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.root = extractor.root + "/api/app" extractor.session.headers["X-Device-Token"] = util.generate_token(64) def board(self, board_slug, params): endpoint = "/list/channel/" + board_slug return self._pagination(endpoint, params, "articles") def post(self, post_id): endpoint = "/view/article/breaking/" + str(post_id) return self._call(endpoint) def user_posts(self, username, params): endpoint = "/list/channel/breaking" params["target"] = "nickname" params["keyword"] = username return self._pagination(endpoint, params, "articles") def _call(self, endpoint, params=None): url = self.root + endpoint response = self.extractor.request(url, params=params) data = response.json() if response.status_code == 200: return data self.log.debug("Server response: %s", data) msg = f": {msg}" if (msg := data.get("message")) else "" raise exception.AbortExtraction(f"API request failed{msg}") def _pagination(self, endpoint, params, key): while True: data = self._call(endpoint, params) posts = data.get(key) if not posts: break yield from posts params.update(data["next"]) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/architizer.py0000644000175000017500000000566215040344700021264 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://architizer.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text class ArchitizerProjectExtractor(GalleryExtractor): """Extractor for project pages on architizer.com""" category = "architizer" subcategory = "project" root = "https://architizer.com" directory_fmt = ("{category}", "{firm}", "{title}") filename_fmt = "{filename}.{extension}" archive_fmt = "{gid}_{num}" pattern = r"(?:https?://)?architizer\.com/projects/([^/?#]+)" example = "https://architizer.com/projects/NAME/" def __init__(self, match): url = f"{self.root}/projects/{match[1]}/" GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) extr('id="Pages"', "") return { "title" : extr("data-name='", "'"), "slug" : extr("data-slug='", "'"), "gid" : extr("data-gid='", "'").rpartition(".")[2], "firm" : extr("data-firm-leaders-str='", "'"), "location" : extr("

    ", "<").strip(), "type" : text.unescape(text.remove_html(extr( '
    Type
    ', 'STATUS', 'YEAR', 'SIZE', '', '') .replace("
    ", "\n")), } def images(self, page): return [ (url, None) for url in text.extract_iter( page, 'property="og:image:secure_url" content="', "?") ] class ArchitizerFirmExtractor(Extractor): """Extractor for all projects of a firm""" category = "architizer" subcategory = "firm" root = "https://architizer.com" pattern = r"(?:https?://)?architizer\.com/firms/([^/?#]+)" example = "https://architizer.com/firms/NAME/" def __init__(self, match): Extractor.__init__(self, match) self.firm = match[1] def items(self): url = url = f"{self.root}/firms/{self.firm}/?requesting_merlin=pages" page = self.request(url).text data = {"_extractor": ArchitizerProjectExtractor} for project in text.extract_iter(page, '
    = data["total_count"]: return params["page"] += 1 def _init_csrf_token(self): url = self.root + "/api/v2/csrf_protection/token.json" headers = { "Accept" : "*/*", "Origin" : self.root, } return self.request_json( url, method="POST", headers=headers, json={})["public_csrf_token"] def _no_cache(self, url): """Cause a cache miss to prevent Cloudflare 'optimizations' Cloudflare's 'Polish' optimization strips image metadata and may even recompress an image as lossy JPEG. This can be prevented by causing a cache miss when requesting an image by adding a random dummy query parameter. Ref: https://github.com/r888888888/danbooru/issues/3528 https://danbooru.donmai.us/forum_topics/14952 """ sep = "&" if "?" in url else "?" token = util.generate_token(8) return url + sep + token[:4] + "=" + token[4:] class ArtstationUserExtractor(ArtstationExtractor): """Extractor for all projects of an artstation user""" subcategory = "user" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)(?:/albums/all)?" r"|((?!www)[\w-]+)\.artstation\.com(?:/projects)?)/?$") example = "https://www.artstation.com/USER" def projects(self): url = f"{self.root}/users/{self.user}/projects.json" params = {"album_id": "all"} return self._pagination(url, params) class ArtstationAlbumExtractor(ArtstationExtractor): """Extractor for all projects in an artstation album""" subcategory = "album" directory_fmt = ("{category}", "{userinfo[username]}", "Albums", "{album[id]} - {album[title]}") archive_fmt = "a_{album[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)" r"|((?!www)[\w-]+)\.artstation\.com)/albums/(\d+)") example = "https://www.artstation.com/USER/albums/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.album_id = text.parse_int(match[3]) def metadata(self): userinfo = self.get_user_info(self.user) album = None for album in userinfo["albums_with_community_projects"]: if album["id"] == self.album_id: break else: raise exception.NotFoundError("album") return { "userinfo": userinfo, "album": album } def projects(self): url = f"{self.root}/users/{self.user}/projects.json" params = {"album_id": self.album_id} return self._pagination(url, params) class ArtstationLikesExtractor(ArtstationExtractor): """Extractor for liked projects of an artstation user""" subcategory = "likes" directory_fmt = ("{category}", "{userinfo[username]}", "Likes") archive_fmt = "f_{userinfo[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/likes") example = "https://www.artstation.com/USER/likes" def projects(self): url = f"{self.root}/users/{self.user}/likes.json" return self._pagination(url) class ArtstationCollectionExtractor(ArtstationExtractor): """Extractor for an artstation collection""" subcategory = "collection" directory_fmt = ("{category}", "{user}", "{collection[id]} {collection[name]}") archive_fmt = "c_{collection[id]}_{asset[id]}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/collections/(\d+)") example = "https://www.artstation.com/USER/collections/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.collection_id = match[2] def metadata(self): url = f"{self.root}/collections/{self.collection_id}.json" params = {"username": self.user} collection = self.request_json( url, params=params, notfound="collection") return {"collection": collection, "user": self.user} def projects(self): url = f"{self.root}/collections/{self.collection_id}/projects.json" params = {"collection_id": self.collection_id} return self._pagination(url, params) class ArtstationCollectionsExtractor(ArtstationExtractor): """Extractor for an artstation user's collections""" subcategory = "collections" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/collections/?$") example = "https://www.artstation.com/USER/collections" def items(self): url = self.root + "/collections.json" params = {"username": self.user} for collection in self.request_json( url, params=params, notfound="collections"): url = f"{self.root}/{self.user}/collections/{collection['id']}" collection["_extractor"] = ArtstationCollectionExtractor yield Message.Queue, url, collection class ArtstationChallengeExtractor(ArtstationExtractor): """Extractor for submissions of artstation challenges""" subcategory = "challenge" filename_fmt = "{submission_id}_{asset_id}_{filename}.{extension}" directory_fmt = ("{category}", "Challenges", "{challenge[id]} - {challenge[title]}") archive_fmt = "c_{challenge[id]}_{asset_id}" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/contests/[^/?#]+/challenges/(\d+)" r"/?(?:\?sorting=([a-z]+))?") example = "https://www.artstation.com/contests/NAME/challenges/12345" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.challenge_id = match[1] self.sorting = match[2] or "popular" def items(self): base = f"{self.root}/contests/_/challenges/{self.challenge_id}" challenge_url = f"{base}.json" submission_url = f"{base}/submissions.json" update_url = f"{self.root}/contests/submission_updates.json" challenge = self.request_json(challenge_url) yield Message.Directory, {"challenge": challenge} params = {"sorting": self.sorting} for submission in self._pagination(submission_url, params): params = {"submission_id": submission["id"]} for update in self._pagination(update_url, params=params): del update["replies"] update["challenge"] = challenge for url in text.extract_iter( update["body_presentation_html"], ' href="', '"'): update["asset_id"] = self._id_from_url(url) text.nameext_from_url(url, update) yield Message.Url, self._no_cache(url), update def _id_from_url(self, url): """Get an image's submission ID from its URL""" parts = url.split("/") return text.parse_int("".join(parts[7:10])) class ArtstationSearchExtractor(ArtstationExtractor): """Extractor for artstation search results""" subcategory = "search" directory_fmt = ("{category}", "Searches", "{search[query]}") archive_fmt = "s_{search[query]}_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/search/?\?([^#]+)") example = "https://www.artstation.com/search?query=QUERY" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.params = query = text.parse_query(match[1]) self.query = text.unquote(query.get("query") or query.get("q", "")) self.sorting = query.get("sort_by", "relevance").lower() self.tags = query.get("tags", "").split(",") def metadata(self): return {"search": { "query" : self.query, "sorting": self.sorting, "tags" : self.tags, }} def projects(self): filters = [] for key, value in self.params.items(): if key.endswith("_ids") or key == "tags": filters.append({ "field" : key, "method": "include", "value" : value.split(","), }) url = f"{self.root}/api/v2/search/projects.json" data = { "query" : self.query, "page" : None, "per_page" : 50, "sorting" : self.sorting, "pro_first" : ("1" if self.config("pro-first", True) else "0"), "filters" : filters, "additional_fields": (), } return self._pagination(url, json=data) class ArtstationArtworkExtractor(ArtstationExtractor): """Extractor for projects on artstation's artwork page""" subcategory = "artwork" directory_fmt = ("{category}", "Artworks", "{artwork[sorting]!c}") archive_fmt = "A_{asset[id]}" pattern = (r"(?:https?://)?(?:\w+\.)?artstation\.com" r"/artwork/?\?([^#]+)") example = "https://www.artstation.com/artwork?sorting=SORT" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.query = text.parse_query(match[1]) def metadata(self): return {"artwork": self.query} def projects(self): url = f"{self.root}/projects.json" return self._pagination(url, self.query.copy()) class ArtstationImageExtractor(ArtstationExtractor): """Extractor for images from a single artstation project""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:[\w-]+\.)?artstation\.com/(?:artwork|projects|search)" r"|artstn\.co/p)/(\w+)") example = "https://www.artstation.com/artwork/abcde" def __init__(self, match): ArtstationExtractor.__init__(self, match) self.project_id = match[1] self.assets = None def metadata(self): self.assets = list(ArtstationExtractor.get_project_assets( self, self.project_id)) try: self.user = self.assets[0]["user"]["username"] except IndexError: self.user = "" return ArtstationExtractor.metadata(self) def projects(self): return ({"hash_id": self.project_id},) def get_project_assets(self, project_id): return self.assets class ArtstationFollowingExtractor(ArtstationExtractor): """Extractor for a user's followed users""" subcategory = "following" pattern = (r"(?:https?://)?(?:www\.)?artstation\.com" r"/(?!artwork|projects|search)([^/?#]+)/following") example = "https://www.artstation.com/USER/following" def items(self): url = f"{self.root}/users/{self.user}/following.json" for user in self._pagination(url): url = f"{self.root}/{user['username']}" user["_extractor"] = ArtstationUserExtractor yield Message.Queue, url, user ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/aryion.py0000644000175000017500000002010515040344700020406 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://aryion.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache from email.utils import parsedate_tz from datetime import datetime BASE_PATTERN = r"(?:https?://)?(?:www\.)?aryion\.com/g4" class AryionExtractor(Extractor): """Base class for aryion extractors""" category = "aryion" directory_fmt = ("{category}", "{user!l}", "{path:J - }") filename_fmt = "{id} {title}.{extension}" archive_fmt = "{id}" cookies_domain = ".aryion.com" cookies_names = ("phpbb3_rl7a3_sid",) root = "https://aryion.com" def __init__(self, match): Extractor.__init__(self, match) self.user = match[1] self.recursive = True def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=14*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/forum/ucp.php?mode=login" data = { "username": username, "password": password, "login": "Login", } response = self.request(url, method="POST", data=data) if b"You have been successfully logged in." not in response.content: raise exception.AuthenticationError() return {c: response.cookies[c] for c in self.cookies_names} def items(self): self.login() data = self.metadata() for post_id in self.posts(): if post := self._parse_post(post_id): if data: post.update(data) yield Message.Directory, post yield Message.Url, post["url"], post elif post is False and self.recursive: base = self.root + "/g4/view/" data = {"_extractor": AryionPostExtractor} for post_id in self._pagination_params(base + post_id): yield Message.Queue, base + post_id, data def posts(self): """Yield relevant post IDs""" def metadata(self): """Return general metadata""" def _pagination_params(self, url, params=None, needle=None): if params is None: params = {"p": 1} else: params["p"] = text.parse_int(params.get("p"), 1) if needle is None: needle = "class='gallery-item' id='" while True: page = self.request(url, params=params).text cnt = 0 for post_id in text.extract_iter(page, needle, "'"): cnt += 1 yield post_id if cnt < 40: return params["p"] += 1 def _pagination_next(self, url): while True: page = self.request(url).text yield from text.extract_iter(page, "thumb' href='/g4/view/", "'") pos = page.find("Next >>") if pos < 0: return url = self.root + text.rextr(page, "href='", "'", pos) def _parse_post(self, post_id): url = f"{self.root}/g4/data.php?id={post_id}" with self.request(url, method="HEAD", fatal=False) as response: if response.status_code >= 400: self.log.warning( "Unable to fetch post %s ('%s %s')", post_id, response.status_code, response.reason) return None headers = response.headers # folder if headers["content-type"] in ( "application/x-folder", "application/x-comic-folder", "application/x-comic-folder-nomerge", ): return False # get filename from 'Content-Disposition' header cdis = headers["content-disposition"] fname, _, ext = text.extr(cdis, 'filename="', '"').rpartition(".") if not fname: fname, ext = ext, fname # get file size from 'Content-Length' header clen = headers.get("content-length") # fix 'Last-Modified' header lmod = headers["last-modified"] if lmod[22] != ":": lmod = f"{lmod[:22]}:{lmod[22:24]} GMT" post_url = f"{self.root}/g4/view/{post_id}" extr = text.extract_from(self.request(post_url).text) title, _, artist = text.unescape(extr( "g4 :: ", "<")).rpartition(" by ") return { "id" : text.parse_int(post_id), "url" : url, "user" : self.user or artist, "title" : title, "artist": artist, "path" : text.split_html(extr( "cookiecrumb'>", '</span'))[4:-1:2], "date" : datetime(*parsedate_tz(lmod)[:6]), "size" : text.parse_int(clen), "views" : text.parse_int(extr("Views</b>:", "<").replace(",", "")), "width" : text.parse_int(extr("Resolution</b>:", "x")), "height": text.parse_int(extr("", "<")), "comments" : text.parse_int(extr("Comments</b>:", "<")), "favorites": text.parse_int(extr("Favorites</b>:", "<")), "tags" : text.split_html(extr("class='taglist'>", "</span>")), "description": text.unescape(text.remove_html(extr( "<p>", "</p>"), "", "")), "filename" : fname, "extension": ext, "_http_lastmodified": lmod, } class AryionGalleryExtractor(AryionExtractor): """Extractor for a user's gallery on eka's portal""" subcategory = "gallery" categorytransfer = True pattern = BASE_PATTERN + r"/(?:gallery/|user/|latest.php\?name=)([^/?#]+)" example = "https://aryion.com/g4/gallery/USER" def __init__(self, match): AryionExtractor.__init__(self, match) self.offset = 0 def _init(self): self.recursive = self.config("recursive", True) def skip(self, num): if self.recursive: return 0 self.offset += num return num def posts(self): if self.recursive: url = f"{self.root}/g4/gallery/{self.user}" return self._pagination_params(url) else: url = f"{self.root}/g4/latest.php?name={self.user}" return util.advance(self._pagination_next(url), self.offset) class AryionFavoriteExtractor(AryionExtractor): """Extractor for a user's favorites gallery""" subcategory = "favorite" directory_fmt = ("{category}", "{user!l}", "favorites") archive_fmt = "f_{user}_{id}" categorytransfer = True pattern = BASE_PATTERN + r"/favorites/([^/?#]+)" example = "https://aryion.com/g4/favorites/USER" def posts(self): url = f"{self.root}/g4/favorites/{self.user}" return self._pagination_params(url, None, "data-item-id='") class AryionTagExtractor(AryionExtractor): """Extractor for tag searches on eka's portal""" subcategory = "tag" directory_fmt = ("{category}", "tags", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/tags\.php\?([^#]+)" example = "https://aryion.com/g4/tags.php?tag=TAG" def _init(self): self.params = text.parse_query(self.user) self.user = None def metadata(self): return {"search_tags": self.params.get("tag")} def posts(self): url = self.root + "/g4/tags.php" return self._pagination_params(url, self.params) class AryionPostExtractor(AryionExtractor): """Extractor for individual posts on eka's portal""" subcategory = "post" pattern = BASE_PATTERN + r"/view/(\d+)" example = "https://aryion.com/g4/view/12345" def posts(self): post_id, self.user = self.user, None return (post_id,) �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/batoto.py����������������������������������������������������0000644�0001750�0001750�00000013163�15040344700�020403� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bato.to/""" from .common import Extractor, ChapterExtractor, MangaExtractor from .. import text, util BASE_PATTERN = (r"(?:https?://)?(" r"(?:ba|d|f|h|j|m|w)to\.to|" r"(?:(?:manga|read)toto|batocomic|[xz]bato)\.(?:com|net|org)|" r"comiko\.(?:net|org)|" r"bat(?:otoo|o?two)\.com)") # https://rentry.co/batoto DOMAINS = { "dto.to", "fto.to", "hto.to", "jto.to", "mto.to", "wto.to", "xbato.com", "xbato.net", "xbato.org", "zbato.com", "zbato.net", "zbato.org", "readtoto.com", "readtoto.net", "readtoto.org", "batocomic.com", "batocomic.net", "batocomic.org", "batotoo.com", "batotwo.com", "comiko.net", "comiko.org", "battwo.com", } LEGACY_DOMAINS = { "bato.to", "mangatoto.com", "mangatoto.net", "mangatoto.org", } class BatotoBase(): """Base class for batoto extractors""" category = "batoto" root = "https://xbato.org" _warn_legacy = True def _init_root(self): domain = self.config("domain") if domain is None or domain in {"auto", "url"}: domain = self.groups[0] if domain in LEGACY_DOMAINS: if self._warn_legacy: BatotoBase._warn_legacy = False self.log.warning("Legacy domain '%s'", domain) elif domain == "nolegacy": domain = self.groups[0] if domain in LEGACY_DOMAINS: domain = "xbato.org" elif domain == "nowarn": domain = self.groups[0] self.root = "https://" + domain def request(self, url, **kwargs): kwargs["encoding"] = "utf-8" return Extractor.request(self, url, **kwargs) class BatotoChapterExtractor(BatotoBase, ChapterExtractor): """Extractor for batoto manga chapters""" archive_fmt = "{chapter_id}_{page}" pattern = BASE_PATTERN + r"/(?:title/[^/?#]+|chapter)/(\d+)" example = "https://xbato.org/title/12345-MANGA/54321" def __init__(self, match): ChapterExtractor.__init__(self, match, False) self._init_root() self.chapter_id = self.groups[1] self.page_url = f"{self.root}/title/0/{self.chapter_id}" def metadata(self, page): extr = text.extract_from(page) try: manga, info, _ = extr("<title>", "<").rsplit(" - ", 3) except ValueError: manga = info = None manga_id = text.extr( extr('rel="canonical" href="', '"'), "/title/", "/") if not manga: manga = extr('link-hover">', "<") info = text.remove_html(extr('link-hover">', "</")) info = text.unescape(info) match = util.re( r"(?i)(?:(?:Volume|S(?:eason)?)\s*(\d+)\s+)?" r"(?:Chapter|Episode)\s*(\d+)([\w.]*)").match(info) if match: volume, chapter, minor = match.groups() else: volume = chapter = 0 minor = "" return { "manga" : text.unescape(manga), "manga_id" : text.parse_int(manga_id), "chapter_url" : extr(self.chapter_id + "-ch_", '"'), "title" : text.unescape(text.remove_html(extr( "selected>", "</option")).partition(" : ")[2]), "volume" : text.parse_int(volume), "chapter" : text.parse_int(chapter), "chapter_minor" : minor, "chapter_string": info, "chapter_id" : text.parse_int(self.chapter_id), "date" : text.parse_timestamp(extr(' time="', '"')[:-3]), } def images(self, page): images_container = text.extr(page, 'pageOpts', ':[0,0]}"') images_container = text.unescape(images_container) return [ (url, None) for url in text.extract_iter(images_container, r"\"", r"\"") ] class BatotoMangaExtractor(BatotoBase, MangaExtractor): """Extractor for batoto manga""" reverse = False chapterclass = BatotoChapterExtractor pattern = (BASE_PATTERN + r"/(?:title/(\d+)[^/?#]*|series/(\d+)(?:/[^/?#]*)?)/?$") example = "https://xbato.org/title/12345-MANGA/" def __init__(self, match): MangaExtractor.__init__(self, match, False) self._init_root() self.manga_id = self.groups[1] or self.groups[2] self.page_url = f"{self.root}/title/{self.manga_id}" def chapters(self, page): extr = text.extract_from(page) if warning := extr(' class="alert alert-warning">', "</div>"): self.log.warning("'%s'", text.remove_html(warning)) data = { "manga_id": text.parse_int(self.manga_id), "manga" : text.unescape(extr( "<title>", "<").rpartition(" - ")[0]), } extr('<div data-hk="0-0-0-0"', "") results = [] while True: href = extr('<a href="/title/', '"') if not href: break chapter = href.rpartition("-ch_")[2] chapter, sep, minor = chapter.partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor data["date"] = text.parse_datetime( extr('time="', '"'), "%Y-%m-%dT%H:%M:%S.%fZ") url = f"{self.root}/title/{href}" results.append((url, data.copy())) return results �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/bbc.py�������������������������������������������������������0000644�0001750�0001750�00000006530�15040344700�017641� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bbc.co.uk/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?bbc\.co\.uk(/programmes/" class BbcGalleryExtractor(GalleryExtractor): """Extractor for a programme gallery on bbc.co.uk""" category = "bbc" root = "https://www.bbc.co.uk" directory_fmt = ("{category}", "{path[0]}", "{path[1]}", "{path[2]}", "{path[3:]:J - /}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{programme}_{num}" pattern = BASE_PATTERN + r"[^/?#]+(?!/galleries)(?:/[^/?#]+)?)$" example = "https://www.bbc.co.uk/programmes/PATH" def metadata(self, page): data = self._extract_jsonld(page) return { "title": text.unescape(text.extr( page, "<h1>", "</h1>").rpartition("</span>")[2]), "description": text.unescape(text.extr( page, 'property="og:description" content="', '"')), "programme": self.page_url.split("/")[4], "path": list(util.unique_sequence( element["name"] for element in data["itemListElement"] )), } def images(self, page): width = self.config("width") width = width - width % 16 if width else 1920 dimensions = f"/{width}xn/" results = [] for img in text.extract_iter(page, 'class="gallery__thumbnail', ">"): src = text.extr(img, 'data-image-src="', '"') results.append(( src.replace("/320x180_b/", dimensions), { "title_image": text.unescape(text.extr( img, 'data-gallery-title="', '"')), "synopsis": text.unescape(text.extr( img, 'data-gallery-synopsis="', '"')), "_fallback": self._fallback_urls(src, width), }, )) return results def _fallback_urls(self, src, max_width): front, _, back = src.partition("/320x180_b/") for width in (1920, 1600, 1280, 976): if width < max_width: yield f"{front}/{width}xn/{back}" class BbcProgrammeExtractor(Extractor): """Extractor for all galleries of a bbc programme""" category = "bbc" subcategory = "programme" root = "https://www.bbc.co.uk" pattern = BASE_PATTERN + r"[^/?#]+/galleries)(?:/?\?page=(\d+))?" example = "https://www.bbc.co.uk/programmes/ID/galleries" def items(self): path, pnum = self.groups data = {"_extractor": BbcGalleryExtractor} params = {"page": text.parse_int(pnum, 1)} galleries_url = self.root + path while True: page = self.request(galleries_url, params=params).text for programme_id in text.extract_iter( page, '<a href="https://www.bbc.co.uk/programmes/', '"'): url = "https://www.bbc.co.uk/programmes/" + programme_id yield Message.Queue, url, data if 'rel="next"' not in page: return params["page"] += 1 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753381746.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/behance.py���������������������������������������������������0000644�0001750�0001750�00000040026�15040475562�020511� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.behance.net/""" from .common import Extractor, Message from .. import text, util, exception class BehanceExtractor(Extractor): """Base class for behance extractors""" category = "behance" root = "https://www.behance.net" request_interval = (2.0, 4.0) browser = "firefox" tls12 = False def _init(self): self._bcp = self.cookies.get("bcp", domain="www.behance.net") if not self._bcp: self._bcp = "4c34489d-914c-46cd-b44c-dfd0e661136d" self.cookies.set("bcp", self._bcp, domain="www.behance.net") def items(self): for gallery in self.galleries(): gallery["_extractor"] = BehanceGalleryExtractor yield Message.Queue, gallery["url"], self._update(gallery) def galleries(self): """Return all relevant gallery URLs""" def _request_graphql(self, endpoint, variables): url = self.root + "/v3/graphql" headers = { "Origin": self.root, "X-BCP" : self._bcp, "X-Requested-With": "XMLHttpRequest", } data = { "query" : GRAPHQL_QUERIES[endpoint], "variables": variables, } return self.request_json( url, method="POST", headers=headers, json=data)["data"] def _update(self, data): # compress data to simple lists if (fields := data.get("fields")) and isinstance(fields[0], dict): data["fields"] = [ field.get("name") or field.get("label") for field in fields ] data["owners"] = [ owner.get("display_name") or owner.get("displayName") for owner in data["owners"] ] tags = data.get("tags") or () if tags and isinstance(tags[0], dict): tags = [tag["title"] for tag in tags] data["tags"] = tags data["date"] = text.parse_timestamp( data.get("publishedOn") or data.get("conceived_on") or 0) if creator := data.get("creator"): creator["name"] = creator["url"].rpartition("/")[2] # backwards compatibility data["gallery_id"] = data["id"] data["title"] = data["name"] data["user"] = ", ".join(data["owners"]) return data class BehanceGalleryExtractor(BehanceExtractor): """Extractor for image galleries from www.behance.net""" subcategory = "gallery" directory_fmt = ("{category}", "{owners:J, }", "{id} {name}") filename_fmt = "{category}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" pattern = r"(?:https?://)?(?:www\.)?behance\.net/gallery/(\d+)" example = "https://www.behance.net/gallery/12345/TITLE" def __init__(self, match): BehanceExtractor.__init__(self, match) self.gallery_id = match[1] def _init(self): BehanceExtractor._init(self) if modules := self.config("modules"): if isinstance(modules, str): modules = modules.split(",") self.modules = set(modules) else: self.modules = {"image", "video", "mediacollection", "embed"} def items(self): data = self.get_gallery_data() imgs = self.get_images(data) data["count"] = len(imgs) yield Message.Directory, data for data["num"], (url, module) in enumerate(imgs, 1): data["module"] = module data["extension"] = (module.get("extension") or text.ext_from_url(url)) yield Message.Url, url, data def get_gallery_data(self): """Collect gallery info dict""" url = f"{self.root}/gallery/{self.gallery_id}/a" cookies = { "gk_suid": "14118261", "gki": "feature_3_in_1_checkout_test:false,hire_browse_get_quote_c" "ta_ab_test:false,feature_hire_dashboard_services_ab_test:f" "alse,feature_show_details_jobs_row_ab_test:false,feature_a" "i_freelance_project_create_flow:false,", "ilo0": "true", "originalReferrer": "", } page = self.request(url, cookies=cookies).text data = util.json_loads(text.extr( page, 'id="beconfig-store_state">', '</script>')) return self._update(data["project"]["project"]) def get_images(self, data): """Extract image results from an API response""" if not data["modules"]: access = data.get("matureAccess") if access == "logged-out": raise exception.AuthorizationError( "Mature content galleries require logged-in cookies") if access == "restricted-safe": raise exception.AuthorizationError( "Mature content blocked in account settings") if access and access != "allowed": raise exception.AuthorizationError() return () results = [] for module in data["modules"]: mtype = module["__typename"][:-6].lower() if mtype not in self.modules: self.log.debug("Skipping '%s' module", mtype) continue if mtype == "image": sizes = { size["url"].rsplit("/", 2)[1]: size for size in module["imageSizes"]["allAvailable"] } size = (sizes.get("source") or sizes.get("max_3840") or sizes.get("fs") or sizes.get("hd") or sizes.get("disp")) results.append((size["url"], module)) elif mtype == "video": try: url = text.extr(module["embed"], 'src="', '"') page = self.request(text.unescape(url)).text url = text.extr(page, '<source src="', '"') if text.ext_from_url(url) == "m3u8": url = "ytdl:" + url module["_ytdl_manifest"] = "hls" module["extension"] = "mp4" results.append((url, module)) continue except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) try: renditions = module["videoData"]["renditions"] except Exception: self.log.warning("No download URLs for video %s", module.get("id") or "???") continue try: url = [ r["url"] for r in renditions if text.ext_from_url(r["url"]) != "m3u8" ][-1] except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) url = "ytdl:" + renditions[-1]["url"] results.append((url, module)) elif mtype == "mediacollection": for component in module["components"]: for size in component["imageSizes"].values(): if size: parts = size["url"].split("/") parts[4] = "source" results.append(("/".join(parts), module)) break elif mtype == "embed": if embed := (module.get("originalEmbed") or module.get("fluidEmbed")): embed = text.unescape(text.extr(embed, 'src="', '"')) module["extension"] = "mp4" results.append(("ytdl:" + embed, module)) elif mtype == "text": module["extension"] = "txt" results.append(("text:" + module["text"], module)) return results class BehanceUserExtractor(BehanceExtractor): """Extractor for a user's galleries from www.behance.net""" subcategory = "user" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/([^/?#]+)/?$" example = "https://www.behance.net/USER" def __init__(self, match): BehanceExtractor.__init__(self, match) self.user = match[1] def galleries(self): endpoint = "GetProfileProjects" variables = { "username": self.user, "after" : "MAo=", # "0" in base64 } while True: data = self._request_graphql(endpoint, variables) items = data["user"]["profileProjects"] yield from items["nodes"] if not items["pageInfo"]["hasNextPage"]: return variables["after"] = items["pageInfo"]["endCursor"] class BehanceCollectionExtractor(BehanceExtractor): """Extractor for a collection's galleries from www.behance.net""" subcategory = "collection" categorytransfer = True pattern = r"(?:https?://)?(?:www\.)?behance\.net/collection/(\d+)" example = "https://www.behance.net/collection/12345/TITLE" def __init__(self, match): BehanceExtractor.__init__(self, match) self.collection_id = match[1] def galleries(self): endpoint = "GetMoodboardItemsAndRecommendations" variables = { "afterItem": "MAo=", # "0" in base64 "firstItem": 40, "id" : int(self.collection_id), "shouldGetItems" : True, "shouldGetMoodboardFields": False, "shouldGetRecommendations": False, } while True: data = self._request_graphql(endpoint, variables) items = data["moodboard"]["items"] for node in items["nodes"]: yield node["entity"] if not items["pageInfo"]["hasNextPage"]: return variables["afterItem"] = items["pageInfo"]["endCursor"] GRAPHQL_QUERIES = { "GetProfileProjects": """\ query GetProfileProjects($username: String, $after: String) { user(username: $username) { profileProjects(first: 12, after: $after) { pageInfo { endCursor hasNextPage } nodes { __typename adminFlags { mature_lock privacy_lock dmca_lock flagged_lock privacy_violation_lock trademark_lock spam_lock eu_ip_lock } colors { r g b } covers { size_202 { url } size_404 { url } size_808 { url } } features { url name featuredOn ribbon { image image2x image3x } } fields { id label slug url } hasMatureContent id isFeatured isHiddenFromWorkTab isMatureReviewSubmitted isOwner isFounder isPinnedToSubscriptionOverview isPrivate linkedAssets { ...sourceLinkFields } linkedAssetsCount sourceFiles { ...sourceFileFields } matureAccess modifiedOn name owners { ...OwnerFields images { size_50 { url } } } premium publishedOn stats { appreciations { all } views { all } comments { all } } slug tools { id title category categoryLabel categoryId approved url backgroundColor } url } } } } fragment sourceFileFields on SourceFile { __typename sourceFileId projectId userId title assetId renditionUrl mimeType size category licenseType unitAmount currency tier hidden extension hasUserPurchased } fragment sourceLinkFields on LinkedAsset { __typename name premium url category licenseType } fragment OwnerFields on User { displayName hasPremiumAccess id isFollowing isProfileOwner location locationUrl url username availabilityInfo { availabilityTimeline isAvailableFullTime isAvailableFreelance } } """, "GetMoodboardItemsAndRecommendations": """\ query GetMoodboardItemsAndRecommendations( $id: Int! $firstItem: Int! $afterItem: String $shouldGetRecommendations: Boolean! $shouldGetItems: Boolean! $shouldGetMoodboardFields: Boolean! ) { viewer @include(if: $shouldGetMoodboardFields) { isOptedOutOfRecommendations isAdmin } moodboard(id: $id) { ...moodboardFields @include(if: $shouldGetMoodboardFields) items(first: $firstItem, after: $afterItem) @include(if: $shouldGetItems) { pageInfo { endCursor hasNextPage } nodes { ...nodesFields } } recommendedItems(first: 80) @include(if: $shouldGetRecommendations) { nodes { ...nodesFields fetchSource } } } } fragment moodboardFields on Moodboard { id label privacy followerCount isFollowing projectCount url isOwner owners { ...OwnerFields images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } } fragment projectFields on Project { __typename id isOwner publishedOn matureAccess hasMatureContent modifiedOn name url isPrivate slug license { license description id label url text images } fields { label } colors { r g b } owners { ...OwnerFields images { size_50 { url } size_100 { url } size_115 { url } size_230 { url } size_138 { url } size_276 { url } } } covers { size_original { url } size_max_808 { url } size_808 { url } size_404 { url } size_202 { url } size_230 { url } size_115 { url } } stats { views { all } appreciations { all } comments { all } } } fragment exifDataValueFields on exifDataValue { id label value searchValue } fragment nodesFields on MoodboardItem { id entityType width height flexWidth flexHeight images { size url } entity { ... on Project { ...projectFields } ... on ImageModule { project { ...projectFields } colors { r g b } exifData { lens { ...exifDataValueFields } software { ...exifDataValueFields } makeAndModel { ...exifDataValueFields } focalLength { ...exifDataValueFields } iso { ...exifDataValueFields } location { ...exifDataValueFields } flash { ...exifDataValueFields } exposureMode { ...exifDataValueFields } shutterSpeed { ...exifDataValueFields } aperture { ...exifDataValueFields } } } ... on MediaCollectionComponent { project { ...projectFields } } } } fragment OwnerFields on User { displayName hasPremiumAccess id isFollowing isProfileOwner location locationUrl url username availabilityInfo { availabilityTimeline isAvailableFullTime isAvailableFreelance } } """, } ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753381746.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/bilibili.py��������������������������������������������������0000644�0001750�0001750�00000013256�15040475562�020710� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.bilibili.com/""" from .common import Extractor, Message from .. import text, util, exception class BilibiliExtractor(Extractor): """Base class for bilibili extractors""" category = "bilibili" root = "https://www.bilibili.com" request_interval = (3.0, 6.0) def _init(self): self.api = BilibiliAPI(self) def items(self): for article in self.articles(): article["_extractor"] = BilibiliArticleExtractor url = f"{self.root}/opus/{article['opus_id']}" yield Message.Queue, url, article def articles(self): return () class BilibiliArticleExtractor(BilibiliExtractor): """Extractor for a bilibili article""" subcategory = "article" pattern = (r"(?:https?://)?" r"(?:t\.bilibili\.com|(?:www\.)?bilibili.com/opus)/(\d+)") example = "https://www.bilibili.com/opus/12345" directory_fmt = ("{category}", "{username}") filename_fmt = "{id}_{num}.{extension}" archive_fmt = "{id}_{num}" def items(self): article_id = self.groups[0] article = self.api.article(article_id) # Flatten modules list modules = {} for module in article["detail"]["modules"]: if module["module_type"] == "MODULE_TYPE_BLOCKED": self.log.warning("%s: Blocked Article\n%s", article_id, module["module_blocked"].get("hint_message")) del module["module_type"] modules.update(module) article["detail"]["modules"] = modules article["username"] = modules["module_author"]["name"] pics = [] if "module_top" in modules: try: pics.extend(modules["module_top"]["display"]["album"]["pics"]) except Exception: pass if "module_content" in modules: for paragraph in modules["module_content"]["paragraphs"]: if "pic" not in paragraph: continue try: pics.extend(paragraph["pic"]["pics"]) except Exception: pass article["count"] = len(pics) yield Message.Directory, article for article["num"], pic in enumerate(pics, 1): url = pic["url"] article.update(pic) yield Message.Url, url, text.nameext_from_url(url, article) class BilibiliUserArticlesExtractor(BilibiliExtractor): """Extractor for a bilibili user's articles""" subcategory = "user-articles" pattern = (r"(?:https?://)?space\.bilibili\.com/(\d+)" r"/(?:article|upload/opus)") example = "https://space.bilibili.com/12345/article" def articles(self): return self.api.user_articles(self.groups[0]) class BilibiliUserArticlesFavoriteExtractor(BilibiliExtractor): subcategory = "user-articles-favorite" pattern = (r"(?:https?://)?space\.bilibili\.com" r"/(\d+)/favlist\?fid=opus") example = "https://space.bilibili.com/12345/favlist?fid=opus" _warning = True def articles(self): if self._warning: if not self.cookies_check(("SESSDATA",)): self.log.error("'SESSDATA' cookie required") BilibiliUserArticlesFavoriteExtractor._warning = False return self.api.user_favlist() class BilibiliAPI(): def __init__(self, extractor): self.extractor = extractor def _call(self, endpoint, params): url = "https://api.bilibili.com/x/polymer/web-dynamic/v1" + endpoint data = self.extractor.request_json(url, params=params) if data["code"]: self.extractor.log.debug("Server response: %s", data) raise exception.AbortExtraction("API request failed") return data def user_articles(self, user_id): endpoint = "/opus/feed/space" params = {"host_mid": user_id} while True: data = self._call(endpoint, params) for item in data["data"]["items"]: params["offset"] = item["opus_id"] yield item if not data["data"]["has_more"]: break def article(self, article_id): url = "https://www.bilibili.com/opus/" + article_id while True: page = self.extractor.request(url).text try: return util.json_loads(text.extr( page, "window.__INITIAL_STATE__=", "};") + "}") except Exception: if "window._riskdata_" not in page: raise exception.AbortExtraction( f"{article_id}: Unable to extract INITIAL_STATE data") self.extractor.wait(seconds=300) def user_favlist(self): endpoint = "/opus/feed/fav" params = {"page": 1, "page_size": 20} while True: data = self._call(endpoint, params)["data"] yield from data["items"] if not data.get("has_more"): break params["page"] += 1 def login_user_id(self): url = "https://api.bilibili.com/x/space/v2/myinfo" data = self.extractor.request_json(url) if data["code"] != 0: self.extractor.log.debug("Server response: %s", data) raise exception.AbortExtraction( "API request failed. Are you logges in?") try: return data["data"]["profile"]["mid"] except Exception: raise exception.AbortExtraction("API request failed") ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/blogger.py���������������������������������������������������0000644�0001750�0001750�00000014134�15040344700�020533� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Blogger blogs""" from .common import BaseExtractor, Message from .. import text, util def original(url): return (util.re(r"(/|=)(?:[sw]\d+|w\d+-h\d+)(?=/|$)") .sub(r"\1s0", url) .replace("http:", "https:", 1)) class BloggerExtractor(BaseExtractor): """Base class for blogger extractors""" basecategory = "blogger" directory_fmt = ("blogger", "{blog[name]}", "{post[date]:%Y-%m-%d} {post[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{post[id]}_{num}" def _init(self): self.api = BloggerAPI(self) self.blog = self.root.rpartition("/")[2] self.videos = self.config("videos", True) def items(self): blog = self.api.blog_by_url("http://" + self.blog) blog["pages"] = blog["pages"]["totalItems"] blog["posts"] = blog["posts"]["totalItems"] blog["date"] = text.parse_datetime(blog["published"]) del blog["selfLink"] findall_image = util.re( r'src="(https?://(?:' r'blogger\.googleusercontent\.com/img|' r'lh\d+(?:-\w+)?\.googleusercontent\.com|' r'\d+\.bp\.blogspot\.com)/[^"]+)').findall findall_video = util.re( r'src="(https?://www\.blogger\.com/video\.g\?token=[^"]+)').findall metadata = self.metadata() for post in self.posts(blog): content = post["content"] files = findall_image(content) for idx, url in enumerate(files): files[idx] = original(url) if self.videos and 'id="BLOG_video-' in content: page = self.request(post["url"]).text for url in findall_video(page): page = self.request(url).text video_config = util.json_loads(text.extr( page, 'var VIDEO_CONFIG =', '\n')) files.append(max( video_config["streams"], key=lambda x: x["format_id"], )["play_url"]) post["author"] = post["author"]["displayName"] post["replies"] = post["replies"]["totalItems"] post["content"] = text.remove_html(content) post["date"] = text.parse_datetime(post["published"]) del post["selfLink"] del post["blog"] data = {"blog": blog, "post": post} if metadata: data.update(metadata) yield Message.Directory, data for data["num"], url in enumerate(files, 1): data["url"] = url yield Message.Url, url, text.nameext_from_url(url, data) def posts(self, blog): """Return an iterable with all relevant post objects""" def metadata(self): """Return additional metadata""" BASE_PATTERN = BloggerExtractor.update({ "blogspot": { "root": None, "pattern": r"[\w-]+\.blogspot\.com", }, }) class BloggerPostExtractor(BloggerExtractor): """Extractor for a single blog post""" subcategory = "post" pattern = BASE_PATTERN + r"(/\d\d\d\d/\d\d/[^/?#]+\.html)" example = "https://BLOG.blogspot.com/1970/01/TITLE.html" def posts(self, blog): return (self.api.post_by_path(blog["id"], self.groups[-1]),) class BloggerBlogExtractor(BloggerExtractor): """Extractor for an entire Blogger blog""" subcategory = "blog" pattern = BASE_PATTERN + r"/?$" example = "https://BLOG.blogspot.com/" def posts(self, blog): return self.api.blog_posts(blog["id"]) class BloggerSearchExtractor(BloggerExtractor): """Extractor for Blogger search resuls""" subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?q=([^&#]+)" example = "https://BLOG.blogspot.com/search?q=QUERY" def metadata(self): self.query = query = text.unquote(self.groups[-1]) return {"query": query} def posts(self, blog): return self.api.blog_search(blog["id"], self.query) class BloggerLabelExtractor(BloggerExtractor): """Extractor for Blogger posts by label""" subcategory = "label" pattern = BASE_PATTERN + r"/search/label/([^/?#]+)" example = "https://BLOG.blogspot.com/search/label/LABEL" def metadata(self): self.label = label = text.unquote(self.groups[-1]) return {"label": label} def posts(self, blog): return self.api.blog_posts(blog["id"], self.label) class BloggerAPI(): """Minimal interface for the Blogger API v3 https://developers.google.com/blogger """ API_KEY = "AIzaSyCN9ax34oMMyM07g_M-5pjeDp_312eITK8" def __init__(self, extractor): self.extractor = extractor self.api_key = extractor.config("api-key") or self.API_KEY def blog_by_url(self, url): return self._call("/blogs/byurl", {"url": url}, "blog") def blog_posts(self, blog_id, label=None): endpoint = f"/blogs/{blog_id}/posts" params = {"labels": label} return self._pagination(endpoint, params) def blog_search(self, blog_id, query): endpoint = f"/blogs/{blog_id}/posts/search" params = {"q": query} return self._pagination(endpoint, params) def post_by_path(self, blog_id, path): endpoint = f"/blogs/{blog_id}/posts/bypath" return self._call(endpoint, {"path": path}, "post") def _call(self, endpoint, params, notfound=None): url = "https://www.googleapis.com/blogger/v3" + endpoint params["key"] = self.api_key return self.extractor.request_json( url, params=params, notfound=notfound) def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) if "items" in data: yield from data["items"] if "nextPageToken" not in data: return params["pageToken"] = data["nextPageToken"] ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/bluesky.py���������������������������������������������������0000644�0001750�0001750�00000047036�15040344700�020577� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bsky.app/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import cache, memcache BASE_PATTERN = (r"(?:https?://)?" r"(?:(?:www\.)?(?:c|[fv]x)?bs[ky]y[ex]?\.app|main\.bsky\.dev)") USER_PATTERN = BASE_PATTERN + r"/profile/([^/?#]+)" class BlueskyExtractor(Extractor): """Base class for bluesky extractors""" category = "bluesky" directory_fmt = ("{category}", "{author[handle]}") filename_fmt = "{createdAt[:19]}_{post_id}_{num}.{extension}" archive_fmt = "{filename}" root = "https://bsky.app" def _init(self): if meta := self.config("metadata") or (): if isinstance(meta, str): meta = meta.replace(" ", "").split(",") elif not isinstance(meta, (list, tuple)): meta = ("user", "facets") self._metadata_user = ("user" in meta) self._metadata_facets = ("facets" in meta) self.api = BlueskyAPI(self) self._user = self._user_did = None self.instance = self.root.partition("://")[2] self.videos = self.config("videos", True) self.quoted = self.config("quoted", False) def items(self): for post in self.posts(): if "post" in post: post = post["post"] if self._user_did and post["author"]["did"] != self._user_did: self.log.debug("Skipping %s (repost)", self._pid(post)) continue embed = post.get("embed") try: post.update(post.pop("record")) except Exception: self.log.debug("Skipping %s (no 'record')", self._pid(post)) continue while True: self._prepare(post) files = self._extract_files(post) yield Message.Directory, post if files: did = post["author"]["did"] base = (f"{self.api.service_endpoint(did)}/xrpc" f"/com.atproto.sync.getBlob?did={did}&cid=") for post["num"], file in enumerate(files, 1): post.update(file) yield Message.Url, base + file["filename"], post if not self.quoted or not embed or "record" not in embed: break quote = embed["record"] if "record" in quote: quote = quote["record"] value = quote.pop("value", None) if value is None: break quote["quote_id"] = self._pid(post) quote["quote_by"] = post["author"] embed = quote.get("embed") quote.update(value) post = quote def posts(self): return () def _posts_records(self, actor, collection): depth = self.config("depth", "0") for record in self.api.list_records(actor, collection): uri = None try: uri = record["value"]["subject"]["uri"] if "/app.bsky.feed.post/" in uri: yield from self.api.get_post_thread_uri(uri, depth) except exception.ControlException: pass # deleted post except Exception as exc: self.log.debug(record, exc_info=exc) self.log.warning("Failed to extract %s (%s: %s)", uri or "record", exc.__class__.__name__, exc) def _pid(self, post): return post["uri"].rpartition("/")[2] @memcache(keyarg=1) def _instance(self, handle): return ".".join(handle.rsplit(".", 2)[-2:]) def _prepare(self, post): author = post["author"] author["instance"] = self._instance(author["handle"]) if self._metadata_facets: if "facets" in post: post["hashtags"] = tags = [] post["mentions"] = dids = [] post["uris"] = uris = [] for facet in post["facets"]: features = facet["features"][0] if "tag" in features: tags.append(features["tag"]) elif "did" in features: dids.append(features["did"]) elif "uri" in features: uris.append(features["uri"]) else: post["hashtags"] = post["mentions"] = post["uris"] = () if self._metadata_user: post["user"] = self._user or author post["instance"] = self.instance post["post_id"] = self._pid(post) post["date"] = text.parse_datetime( post["createdAt"][:19], "%Y-%m-%dT%H:%M:%S") def _extract_files(self, post): if "embed" not in post: post["count"] = 0 return () files = [] media = post["embed"] if "media" in media: media = media["media"] if "images" in media: for image in media["images"]: files.append(self._extract_media(image, "image")) if "video" in media and self.videos: files.append(self._extract_media(media, "video")) post["count"] = len(files) return files def _extract_media(self, media, key): try: aspect = media["aspectRatio"] width = aspect["width"] height = aspect["height"] except KeyError: width = height = 0 data = media[key] try: cid = data["ref"]["$link"] except KeyError: cid = data["cid"] return { "description": media.get("alt") or "", "width" : width, "height" : height, "filename" : cid, "extension" : data["mimeType"].rpartition("/")[2], } def _make_post(self, actor, kind): did = self.api._did_from_actor(actor) profile = self.api.get_profile(did) if kind not in profile: return () cid = profile[kind].rpartition("/")[2].partition("@")[0] return ({ "post": { "embed": {"images": [{ "alt": kind, "image": { "$type" : "blob", "ref" : {"$link": cid}, "mimeType": "image/jpeg", "size" : 0, }, "aspectRatio": { "width" : 1000, "height": 1000, }, }]}, "author" : profile, "record" : (), "createdAt": "", "uri" : cid, }, },) class BlueskyUserExtractor(Dispatch, BlueskyExtractor): pattern = USER_PATTERN + r"$" example = "https://bsky.app/profile/HANDLE" def items(self): base = f"{self.root}/profile/{self.groups[0]}/" default = ("posts" if self.config("quoted", False) or self.config("reposts", False) else "media") return self._dispatch_extractors(( (BlueskyInfoExtractor , base + "info"), (BlueskyAvatarExtractor , base + "avatar"), (BlueskyBackgroundExtractor, base + "banner"), (BlueskyPostsExtractor , base + "posts"), (BlueskyRepliesExtractor , base + "replies"), (BlueskyMediaExtractor , base + "media"), (BlueskyVideoExtractor , base + "video"), (BlueskyLikesExtractor , base + "likes"), ), (default,)) class BlueskyPostsExtractor(BlueskyExtractor): subcategory = "posts" pattern = USER_PATTERN + r"/posts" example = "https://bsky.app/profile/HANDLE/posts" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_and_author_threads") class BlueskyRepliesExtractor(BlueskyExtractor): subcategory = "replies" pattern = USER_PATTERN + r"/replies" example = "https://bsky.app/profile/HANDLE/replies" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_with_replies") class BlueskyMediaExtractor(BlueskyExtractor): subcategory = "media" pattern = USER_PATTERN + r"/media" example = "https://bsky.app/profile/HANDLE/media" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_with_media") class BlueskyVideoExtractor(BlueskyExtractor): subcategory = "video" pattern = USER_PATTERN + r"/video" example = "https://bsky.app/profile/HANDLE/video" def posts(self): return self.api.get_author_feed( self.groups[0], "posts_with_video") class BlueskyLikesExtractor(BlueskyExtractor): subcategory = "likes" pattern = USER_PATTERN + r"/likes" example = "https://bsky.app/profile/HANDLE/likes" def posts(self): if self.config("endpoint") == "getActorLikes": return self.api.get_actor_likes(self.groups[0]) return self._posts_records(self.groups[0], "app.bsky.feed.like") class BlueskyFeedExtractor(BlueskyExtractor): subcategory = "feed" pattern = USER_PATTERN + r"/feed/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/feed/NAME" def posts(self): actor, feed = self.groups return self.api.get_feed(actor, feed) class BlueskyListExtractor(BlueskyExtractor): subcategory = "list" pattern = USER_PATTERN + r"/lists/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/lists/ID" def posts(self): actor, list_id = self.groups return self.api.get_list_feed(actor, list_id) class BlueskyFollowingExtractor(BlueskyExtractor): subcategory = "following" pattern = USER_PATTERN + r"/follows" example = "https://bsky.app/profile/HANDLE/follows" def items(self): for user in self.api.get_follows(self.groups[0]): url = "https://bsky.app/profile/" + user["did"] user["_extractor"] = BlueskyUserExtractor yield Message.Queue, url, user class BlueskyPostExtractor(BlueskyExtractor): subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)" example = "https://bsky.app/profile/HANDLE/post/ID" def posts(self): actor, post_id = self.groups return self.api.get_post_thread(actor, post_id) class BlueskyInfoExtractor(BlueskyExtractor): subcategory = "info" pattern = USER_PATTERN + r"/info" example = "https://bsky.app/profile/HANDLE/info" def items(self): self._metadata_user = True self.api._did_from_actor(self.groups[0]) return iter(((Message.Directory, self._user),)) class BlueskyAvatarExtractor(BlueskyExtractor): subcategory = "avatar" filename_fmt = "avatar_{post_id}.{extension}" pattern = USER_PATTERN + r"/avatar" example = "https://bsky.app/profile/HANDLE/avatar" def posts(self): return self._make_post(self.groups[0], "avatar") class BlueskyBackgroundExtractor(BlueskyExtractor): subcategory = "background" filename_fmt = "background_{post_id}.{extension}" pattern = USER_PATTERN + r"/ba(?:nner|ckground)" example = "https://bsky.app/profile/HANDLE/banner" def posts(self): return self._make_post(self.groups[0], "banner") class BlueskySearchExtractor(BlueskyExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/|\?q=)(.+)" example = "https://bsky.app/search?q=QUERY" def posts(self): query = text.unquote(self.groups[0].replace("+", " ")) return self.api.search_posts(query) class BlueskyHashtagExtractor(BlueskyExtractor): subcategory = "hashtag" pattern = BASE_PATTERN + r"/hashtag/([^/?#]+)(?:/(top|latest))?" example = "https://bsky.app/hashtag/NAME" def posts(self): hashtag, order = self.groups return self.api.search_posts("#"+hashtag, order) class BlueskyAPI(): """Interface for the Bluesky API https://docs.bsky.app/docs/category/http-reference """ def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"Accept": "application/json"} self.username, self.password = extractor._get_auth_info() if self.username: self.root = "https://bsky.social" else: self.root = "https://api.bsky.app" self.authenticate = util.noop def get_actor_likes(self, actor): endpoint = "app.bsky.feed.getActorLikes" params = { "actor": self._did_from_actor(actor), "limit": "100", } return self._pagination(endpoint, params, check_empty=True) def get_author_feed(self, actor, filter="posts_and_author_threads"): endpoint = "app.bsky.feed.getAuthorFeed" params = { "actor" : self._did_from_actor(actor, True), "filter": filter, "limit" : "100", } return self._pagination(endpoint, params) def get_feed(self, actor, feed): endpoint = "app.bsky.feed.getFeed" uri = (f"at://{self._did_from_actor(actor)}" f"/app.bsky.feed.generator/{feed}") params = {"feed": uri, "limit": "100"} return self._pagination(endpoint, params) def get_follows(self, actor): endpoint = "app.bsky.graph.getFollows" params = { "actor": self._did_from_actor(actor), "limit": "100", } return self._pagination(endpoint, params, "follows") def get_list_feed(self, actor, list): endpoint = "app.bsky.feed.getListFeed" uri = f"at://{self._did_from_actor(actor)}/app.bsky.graph.list/{list}" params = {"list" : uri, "limit": "100"} return self._pagination(endpoint, params) def get_post_thread(self, actor, post_id): uri = (f"at://{self._did_from_actor(actor)}" f"/app.bsky.feed.post/{post_id}") depth = self.extractor.config("depth", "0") return self.get_post_thread_uri(uri, depth) def get_post_thread_uri(self, uri, depth="0"): endpoint = "app.bsky.feed.getPostThread" params = { "uri" : uri, "depth" : depth, "parentHeight": "0", } thread = self._call(endpoint, params)["thread"] if "replies" not in thread: return (thread,) index = 0 posts = [thread] while index < len(posts): post = posts[index] if "replies" in post: posts.extend(post["replies"]) index += 1 return posts @memcache(keyarg=1) def get_profile(self, did): endpoint = "app.bsky.actor.getProfile" params = {"actor": did} return self._call(endpoint, params) def list_records(self, actor, collection): endpoint = "com.atproto.repo.listRecords" actor_did = self._did_from_actor(actor) params = { "repo" : actor_did, "collection": collection, "limit" : "100", # "reverse" : "false", } return self._pagination(endpoint, params, "records", self.service_endpoint(actor_did)) @memcache(keyarg=1) def resolve_handle(self, handle): endpoint = "com.atproto.identity.resolveHandle" params = {"handle": handle} return self._call(endpoint, params)["did"] @memcache(keyarg=1) def service_endpoint(self, did): if did.startswith('did:web:'): url = "https://" + did[8:] + "/.well-known/did.json" else: url = "https://plc.directory/" + did try: data = self.extractor.request_json(url) for service in data["service"]: if service["type"] == "AtprotoPersonalDataServer": return service["serviceEndpoint"] except Exception: pass return "https://bsky.social" def search_posts(self, query, sort=None): endpoint = "app.bsky.feed.searchPosts" params = { "q" : query, "limit": "100", "sort" : sort, } return self._pagination(endpoint, params, "posts") def _did_from_actor(self, actor, user_did=False): if actor.startswith("did:"): did = actor else: did = self.resolve_handle(actor) extr = self.extractor if user_did and not extr.config("reposts", False): extr._user_did = did if extr._metadata_user: extr._user = user = self.get_profile(did) user["instance"] = extr._instance(user["handle"]) return did def authenticate(self): self.headers["Authorization"] = self._authenticate_impl(self.username) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, username): refresh_token = _refresh_token_cache(username) if refresh_token: self.log.info("Refreshing access token for %s", username) endpoint = "com.atproto.server.refreshSession" headers = {"Authorization": "Bearer " + refresh_token} data = None else: self.log.info("Logging in as %s", username) endpoint = "com.atproto.server.createSession" headers = None data = { "identifier": username, "password" : self.password, } url = f"{self.root}/xrpc/{endpoint}" response = self.extractor.request( url, method="POST", headers=headers, json=data, fatal=None) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError( f"\"{data.get('error')}: {data.get('message')}\"") _refresh_token_cache.update(self.username, data["refreshJwt"]) return "Bearer " + data["accessJwt"] def _call(self, endpoint, params, root=None): if root is None: root = self.root url = f"{root}/xrpc/{endpoint}" while True: self.authenticate() response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("RateLimit-Reset") self.extractor.wait(until=until) continue msg = "API request failed" try: data = response.json() msg = f"{msg} ('{data['error']}: {data['message']}')" except Exception: msg = f"{msg} ({response.status_code} {response.reason})" self.extractor.log.debug("Server response: %s", response.text) raise exception.AbortExtraction(msg) def _pagination(self, endpoint, params, key="feed", root=None, check_empty=False): while True: data = self._call(endpoint, params, root) if check_empty and not data[key]: return yield from data[key] cursor = data.get("cursor") if not cursor: return params["cursor"] = cursor @cache(maxage=84*86400, keyarg=0) def _refresh_token_cache(username): return None ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/booru.py�����������������������������������������������������0000644�0001750�0001750�00000005356�15040344700�020246� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for *booru sites""" from .common import BaseExtractor, Message from .. import text import operator class BooruExtractor(BaseExtractor): """Base class for *booru extractors""" basecategory = "booru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 0 per_page = 100 def items(self): self.login() data = self.metadata() tags = self.config("tags", False) notes = self.config("notes", False) fetch_html = tags or notes if url_key := self.config("url"): if isinstance(url_key, (list, tuple)): self._file_url = self._file_url_list self._file_url_keys = url_key else: self._file_url = operator.itemgetter(url_key) for post in self.posts(): try: url = self._file_url(post) if url[0] == "/": url = self.root + url except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) self.log.warning("Unable to fetch download URL for post %s " "(md5: %s)", post.get("id"), post.get("md5")) continue if fetch_html: html = self._html(post) if tags: self._tags(post, html) if notes: self._notes(post, html) text.nameext_from_url(url, post) post.update(data) self._prepare(post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def login(self): """Login and set necessary cookies""" def metadata(self): """Return a dict with general metadata""" return () def posts(self): """Return an iterable with post objects""" return () _file_url = operator.itemgetter("file_url") def _file_url_list(self, post): urls = (post[key] for key in self._file_url_keys if post.get(key)) post["_fallback"] = it = iter(urls) return next(it) def _prepare(self, post): """Prepare a 'post's metadata""" def _html(self, post): """Return HTML content of a post""" def _tags(self, post, page): """Extract extended tag metadata""" def _notes(self, post, page): """Extract notes metadata""" ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/boosty.py����������������������������������������������������0000644�0001750�0001750�00000035121�15040344700�020430� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.boosty.to/""" from .common import Extractor, Message from .. import text, util, exception import itertools BASE_PATTERN = r"(?:https?://)?boosty\.to" class BoostyExtractor(Extractor): """Base class for boosty extractors""" category = "boosty" root = "https://www.boosty.to" directory_fmt = ("{category}", "{user[blogUrl]} ({user[id]})", "{post[date]:%Y-%m-%d} {post[int_id]}") filename_fmt = "{num:>02} {file[id]}.{extension}" archive_fmt = "{file[id]}" cookies_domain = ".boosty.to" cookies_names = ("auth",) def _init(self): self.api = BoostyAPI(self) self._user = None if self.config("metadata") else False self.only_allowed = self.config("allowed", True) self.only_bought = self.config("bought") videos = self.config("videos") if videos is None or videos: if isinstance(videos, str): videos = videos.split(",") elif not isinstance(videos, (list, tuple)): # ultra_hd: 2160p # quad_hd: 1440p # full_hd: 1080p # high: 720p # medium: 480p # low: 360p # lowest: 240p # tiny: 144p videos = ("ultra_hd", "quad_hd", "full_hd", "high", "medium", "low", "lowest", "tiny") self.videos = videos def items(self): for post in self.posts(): if not post.get("hasAccess"): self.log.warning("Not allowed to access post %s", post["id"]) continue files = self._extract_files(post) if self._user: post["user"] = self._user data = { "post" : post, "user" : post.pop("user", None), "count": len(files), } yield Message.Directory, data for data["num"], file in enumerate(files, 1): data["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, data) def posts(self): """Yield JSON content of all relevant posts""" def _extract_files(self, post): files = [] post["content"] = content = [] post["links"] = links = [] if "createdAt" in post: post["date"] = text.parse_timestamp(post["createdAt"]) for block in post["data"]: try: type = block["type"] if type == "text": if block["modificator"] == "BLOCK_END": continue c = util.json_loads(block["content"]) content.append(c[0]) elif type == "image": files.append(self._update_url(post, block)) elif type == "ok_video": if not self.videos: self.log.debug("%s: Skipping video %s", post["id"], block["id"]) continue fmts = { fmt["type"]: fmt["url"] for fmt in block["playerUrls"] if fmt["url"] } formats = [ fmts[fmt] for fmt in self.videos if fmt in fmts ] if formats: formats = iter(formats) block["url"] = next(formats) block["_fallback"] = formats files.append(block) else: self.log.warning( "%s: Found no suitable video format for %s", post["id"], block["id"]) elif type == "link": url = block["url"] links.append(url) content.append(url) elif type == "audio_file": files.append(self._update_url(post, block)) elif type == "file": files.append(self._update_url(post, block)) elif type == "smile": content.append(":" + block["name"] + ":") else: self.log.debug("%s: Unsupported data type '%s'", post["id"], type) except Exception as exc: self.log.debug("%s: %s", exc.__class__.__name__, exc) del post["data"] return files def _update_url(self, post, block): url = block["url"] sep = "&" if "?" in url else "?" if signed_query := post.get("signedQuery"): url += sep + signed_query[1:] sep = "&" migrated = post.get("isMigrated") if migrated is not None: url += sep + "is_migrated=" + str(migrated).lower() block["url"] = url return block class BoostyUserExtractor(BoostyExtractor): """Extractor for boosty.to user profiles""" subcategory = "user" pattern = BASE_PATTERN + r"/([^/?#]+)(?:\?([^#]+))?$" example = "https://boosty.to/USER" def posts(self): user, query = self.groups params = text.parse_query(query) if self._user is None: self._user = self.api.user(user) return self.api.blog_posts(user, params) class BoostyMediaExtractor(BoostyExtractor): """Extractor for boosty.to user media""" subcategory = "media" directory_fmt = "{category}", "{user[blogUrl]} ({user[id]})", "media" filename_fmt = "{post[id]}_{num}.{extension}" pattern = BASE_PATTERN + r"/([^/?#]+)/media/([^/?#]+)(?:\?([^#]+))?" example = "https://boosty.to/USER/media/all" def posts(self): user, media, query = self.groups params = text.parse_query(query) self._user = self.api.user(user) return self.api.blog_media_album(user, media, params) class BoostyFeedExtractor(BoostyExtractor): """Extractor for your boosty.to subscription feed""" subcategory = "feed" pattern = BASE_PATTERN + r"/(?:\?([^#]+))?(?:$|#)" example = "https://boosty.to/" def posts(self): params = text.parse_query(self.groups[0]) return self.api.feed_posts(params) class BoostyPostExtractor(BoostyExtractor): """Extractor for boosty.to posts""" subcategory = "post" pattern = BASE_PATTERN + r"/([^/?#]+)/posts/([0-9a-f-]+)" example = "https://boosty.to/USER/posts/01234567-89ab-cdef-0123-456789abcd" def posts(self): user, post_id = self.groups if self._user is None: self._user = self.api.user(user) return (self.api.post(user, post_id),) class BoostyFollowingExtractor(BoostyExtractor): """Extractor for your boosty.to subscribed users""" subcategory = "following" pattern = BASE_PATTERN + r"/app/settings/subscriptions" example = "https://boosty.to/app/settings/subscriptions" def items(self): for user in self.api.user_subscriptions(): url = f"{self.root}/{user['blog']['blogUrl']}" user["_extractor"] = BoostyUserExtractor yield Message.Queue, url, user class BoostyDirectMessagesExtractor(BoostyExtractor): """Extractor for boosty.to direct messages""" subcategory = "direct-messages" directory_fmt = ("{category}", "{user[blogUrl]} ({user[id]})", "Direct Messages") pattern = BASE_PATTERN + r"/app/messages/?\?dialogId=(\d+)" example = "https://boosty.to/app/messages?dialogId=12345" def items(self): """Yield direct messages from a given dialog ID.""" dialog_id = self.groups[0] response = self.api.dialog(dialog_id) signed_query = response.get("signedQuery") try: messages = response["messages"]["data"] offset = messages[0]["id"] except Exception: return try: user = self.api.user(response["chatmate"]["url"]) except Exception: user = None messages.reverse() for message in itertools.chain( messages, self.api.dialog_messages(dialog_id, offset=offset) ): message["signedQuery"] = signed_query files = self._extract_files(message) data = { "post": message, "user": user, "count": len(files), } yield Message.Directory, data for data["num"], file in enumerate(files, 1): data["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, data) class BoostyAPI(): """Interface for the Boosty API""" root = "https://api.boosty.to" def __init__(self, extractor, access_token=None): self.extractor = extractor self.headers = { "Accept": "application/json, text/plain, */*", "Origin": extractor.root, } if not access_token: if auth := self.extractor.cookies.get("auth", domain=".boosty.to"): access_token = text.extr( auth, "%22accessToken%22%3A%22", "%22") if access_token: self.headers["Authorization"] = "Bearer " + access_token def blog_posts(self, username, params): endpoint = f"/v1/blog/{username}/post/" params = self._merge_params(params, { "limit" : "5", "offset" : None, "comments_limit": "2", "reply_limit" : "1", }) return self._pagination(endpoint, params) def blog_media_album(self, username, type="all", params=()): endpoint = f"/v1/blog/{username}/media_album/" params = self._merge_params(params, { "type" : type.rstrip("s"), "limit" : "15", "limit_by": "media", "offset" : None, }) return self._pagination(endpoint, params, self._transform_media_posts) def _transform_media_posts(self, data): posts = [] for obj in data["mediaPosts"]: post = obj["post"] post["data"] = obj["media"] posts.append(post) return posts def post(self, username, post_id): endpoint = f"/v1/blog/{username}/post/{post_id}" return self._call(endpoint) def feed_posts(self, params=None): endpoint = "/v1/feed/post/" params = self._merge_params(params, { "limit" : "5", "offset" : None, "comments_limit": "2", }) if "only_allowed" not in params and self.extractor.only_allowed: params["only_allowed"] = "true" if "only_bought" not in params and self.extractor.only_bought: params["only_bought"] = "true" return self._pagination(endpoint, params, key="posts") def user(self, username): endpoint = "/v1/blog/" + username user = self._call(endpoint) user["id"] = user["owner"]["id"] return user def user_subscriptions(self, params=None): endpoint = "/v1/user/subscriptions" params = self._merge_params(params, { "limit" : "30", "with_follow": "true", "offset" : None, }) return self._pagination_users(endpoint, params) def _merge_params(self, params_web, params_api): if params_web: web_to_api = { "isOnlyAllowedPosts": "is_only_allowed", "postsTagsIds" : "tags_ids", "postsFrom" : "from_ts", "postsTo" : "to_ts", } for name, value in params_web.items(): name = web_to_api.get(name, name) params_api[name] = value return params_api def _call(self, endpoint, params=None): url = self.root + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None, allow_redirects=False) if response.status_code < 300: return response.json() elif response.status_code < 400: raise exception.AuthenticationError("Invalid API access token") elif response.status_code == 429: self.extractor.wait(seconds=600) else: self.extractor.log.debug(response.text) raise exception.AbortExtraction("API request failed") def _pagination(self, endpoint, params, transform=None, key=None): if "is_only_allowed" not in params and self.extractor.only_allowed: params["only_allowed"] = "true" params["is_only_allowed"] = "true" while True: data = self._call(endpoint, params) if transform: yield from transform(data["data"]) elif key: yield from data["data"][key] else: yield from data["data"] extra = data["extra"] if extra.get("isLast"): return offset = extra.get("offset") if not offset: return params["offset"] = offset def _pagination_users(self, endpoint, params): while True: data = self._call(endpoint, params) yield from data["data"] offset = data["offset"] + data["limit"] if offset > data["total"]: return params["offset"] = offset def dialog(self, dialog_id): endpoint = f"/v1/dialog/{dialog_id}" return self._call(endpoint) def dialog_messages(self, dialog_id, limit=300, offset=None): endpoint = f"/v1/dialog/{dialog_id}/message/" params = { "limit": limit, "reverse": "true", "offset": offset, } return self._pagination_dialog(endpoint, params) def _pagination_dialog(self, endpoint, params): while True: data = self._call(endpoint, params) yield from data["data"] try: extra = data["extra"] if extra.get("isLast"): break params["offset"] = offset = extra["offset"] if not offset: break except Exception: break �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/bunkr.py�����������������������������������������������������0000644�0001750�0001750�00000016256�15040344700�020242� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://bunkr.si/""" from .common import Extractor from .lolisafe import LolisafeAlbumExtractor from .. import text, util, config, exception import random if config.get(("extractor", "bunkr"), "tlds"): BASE_PATTERN = ( r"(?:bunkr:(?:https?://)?([^/?#]+)|" r"(?:https?://)?(?:app\.)?(bunkr+\.\w+))" ) else: BASE_PATTERN = ( r"(?:bunkr:(?:https?://)?([^/?#]+)|" r"(?:https?://)?(?:app\.)?(bunkr+" r"\.(?:s[kiu]|c[ir]|fi|p[hks]|ru|la|is|to|a[cx]" r"|black|cat|media|red|site|ws|org)))" ) DOMAINS = [ "bunkr.ac", "bunkr.ci", "bunkr.cr", "bunkr.fi", "bunkr.ph", "bunkr.pk", "bunkr.ps", "bunkr.si", "bunkr.sk", "bunkr.ws", "bunkr.black", "bunkr.red", "bunkr.media", "bunkr.site", ] LEGACY_DOMAINS = { "bunkr.ax", "bunkr.cat", "bunkr.ru", "bunkrr.ru", "bunkr.su", "bunkrr.su", "bunkr.la", "bunkr.is", "bunkr.to", } CF_DOMAINS = set() class BunkrAlbumExtractor(LolisafeAlbumExtractor): """Extractor for bunkr.si albums""" category = "bunkr" root = "https://bunkr.si" root_dl = "https://get.bunkrr.su" root_api = "https://apidl.bunkr.ru" archive_fmt = "{album_id}_{id|id_url}" pattern = BASE_PATTERN + r"/a/([^/?#]+)" example = "https://bunkr.si/a/ID" def __init__(self, match): LolisafeAlbumExtractor.__init__(self, match) domain = self.groups[0] or self.groups[1] if domain not in LEGACY_DOMAINS: self.root = "https://" + domain def _init(self): LolisafeAlbumExtractor._init(self) endpoint = self.config("endpoint") if not endpoint: endpoint = self.root_api + "/api/_001_v2" elif endpoint[0] == "/": endpoint = self.root_api + endpoint self.endpoint = endpoint self.offset = 0 def skip(self, num): self.offset = num return num def request(self, url, **kwargs): kwargs["encoding"] = "utf-8" kwargs["allow_redirects"] = False while True: try: response = Extractor.request(self, url, **kwargs) if response.status_code < 300: return response # redirect url = response.headers["Location"] if url[0] == "/": url = self.root + url continue root, path = self._split(url) if root not in CF_DOMAINS: continue self.log.debug("Redirect to known CF challenge domain '%s'", root) except exception.HttpError as exc: if exc.status != 403: raise # CF challenge root, path = self._split(url) CF_DOMAINS.add(root) self.log.debug("Added '%s' to CF challenge domains", root) try: DOMAINS.remove(root.rpartition("/")[2]) except ValueError: pass else: if not DOMAINS: raise exception.AbortExtraction( "All Bunkr domains require solving a CF challenge") # select alternative domain self.root = root = "https://" + random.choice(DOMAINS) self.log.debug("Trying '%s' as fallback", root) url = root + path def fetch_album(self, album_id): # album metadata page = self.request(self.root + "/a/" + album_id).text title = text.unescape(text.unescape(text.extr( page, 'property="og:title" content="', '"'))) # files items = list(text.extract_iter( page, '<div class="grid-images_box', "</a>")) return self._extract_files(items), { "album_id" : album_id, "album_name" : title, "album_size" : text.extr( page, '<span class="font-semibold">(', ')'), "count" : len(items), } def _extract_files(self, items): if self.offset: items = util.advance(items, self.offset) for item in items: try: url = text.unescape(text.extr(item, ' href="', '"')) if url[0] == "/": url = self.root + url file = self._extract_file(url) info = text.split_html(item) if not file["name"]: file["name"] = info[-3] file["size"] = info[-2] file["date"] = text.parse_datetime( info[-1], "%H:%M:%S %d/%m/%Y") yield file except exception.ControlException: raise except Exception as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) self.log.debug("", exc_info=exc) def _extract_file(self, webpage_url): page = self.request(webpage_url).text data_id = text.extr(page, 'data-file-id="', '"') referer = self.root_dl + "/file/" + data_id headers = {"Referer": referer, "Origin": self.root_dl} data = self.request_json(self.endpoint, method="POST", headers=headers, json={"id": data_id}) if data.get("encrypted"): key = f"SECRET_KEY_{data['timestamp'] // 3600}" file_url = util.decrypt_xor(data["url"], key.encode()) else: file_url = data["url"] file_name = text.extr(page, "<h1", "<").rpartition(">")[2] fallback = text.extr(page, 'property="og:url" content="', '"') return { "file" : file_url, "name" : text.unescape(file_name), "id_url" : data_id, "_fallback" : (fallback,) if fallback else (), "_http_headers" : {"Referer": referer}, "_http_validate": self._validate, } def _validate(self, response): if response.history and response.url.endswith("/maintenance-vid.mp4"): self.log.warning("File server in maintenance mode") return False return True def _split(self, url): pos = url.index("/", 8) return url[:pos], url[pos:] class BunkrMediaExtractor(BunkrAlbumExtractor): """Extractor for bunkr.si media links""" subcategory = "media" directory_fmt = ("{category}",) pattern = BASE_PATTERN + r"(/[fvid]/[^/?#]+)" example = "https://bunkr.si/f/FILENAME" def fetch_album(self, album_id): try: file = self._extract_file(self.root + album_id) except Exception as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) return (), {} return (file,), { "album_id" : "", "album_name" : "", "album_size" : -1, "description": "", "count" : 1, } ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/catbox.py����������������������������������������������������0000644�0001750�0001750�00000003512�15040344700�020370� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://catbox.moe/""" from .common import GalleryExtractor, Extractor, Message from .. import text class CatboxAlbumExtractor(GalleryExtractor): """Extractor for catbox albums""" category = "catbox" subcategory = "album" root = "https://catbox.moe" filename_fmt = "{filename}.{extension}" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{filename}" pattern = r"(?:https?://)?(?:www\.)?catbox\.moe(/c/[^/?#]+)" example = "https://catbox.moe/c/ID" def metadata(self, page): extr = text.extract_from(page) return { "album_id" : self.page_url.rpartition("/")[2], "album_name" : text.unescape(extr("<h1>", "<")), "date" : text.parse_datetime(extr( "<p>Created ", "<"), "%B %d %Y"), "description": text.unescape(extr("<p>", "<")), } def images(self, page): return [ ("https://files.catbox.moe/" + path, None) for path in text.extract_iter( page, ">https://files.catbox.moe/", "<") ] class CatboxFileExtractor(Extractor): """Extractor for catbox files""" category = "catbox" subcategory = "file" archive_fmt = "{filename}" pattern = r"(?:https?://)?(?:files|litter|de)\.catbox\.moe/([^/?#]+)" example = "https://files.catbox.moe/NAME.EXT" def items(self): url = text.ensure_http_scheme(self.url) file = text.nameext_from_url(url, {"url": url}) yield Message.Directory, file yield Message.Url, url, file ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/chevereto.py�������������������������������������������������0000644�0001750�0001750�00000007511�15040344700�021077� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Chevereto galleries""" from .common import BaseExtractor, Message from .. import text, util class CheveretoExtractor(BaseExtractor): """Base class for chevereto extractors""" basecategory = "chevereto" directory_fmt = ("{category}", "{user}", "{album}",) archive_fmt = "{id}" def _init(self): self.path = self.groups[-1] def _pagination(self, url): while True: page = self.request(url).text for item in text.extract_iter( page, '<div class="list-item-image ', 'image-container'): yield text.urljoin(self.root, text.extr( item, '<a href="', '"')) url = text.extr(page, 'data-pagination="next" href="', '"') if not url: return if url[0] == "/": url = self.root + url BASE_PATTERN = CheveretoExtractor.update({ "jpgfish": { "root": "https://jpg5.su", "pattern": r"jpe?g\d?\.(?:su|pet|fish(?:ing)?|church)", }, "imgkiwi": { "root": "https://img.kiwi", "pattern": r"img\.kiwi", }, "imagepond": { "root": "https://imagepond.net", "pattern": r"imagepond\.net", }, }) class CheveretoImageExtractor(CheveretoExtractor): """Extractor for chevereto Images""" subcategory = "image" pattern = BASE_PATTERN + r"(/im(?:g|age)/[^/?#]+)" example = "https://jpg2.su/img/TITLE.ID" def items(self): url = self.root + self.path page = self.request(url).text extr = text.extract_from(page) url = (extr('<meta property="og:image" content="', '"') or extr('url: "', '"')) if not url or url.endswith("/loading.svg"): pos = page.find(" download=") url = text.rextr(page, 'href="', '"', pos) if not url.startswith("https://"): url = util.decrypt_xor( url, b"seltilovessimpcity@simpcityhatesscrapers", fromhex=True) image = { "id" : self.path.rpartition(".")[2], "url" : url, "album": text.extr(extr("Added to <a", "/a>"), ">", "<"), "date" : text.parse_datetime(extr( '<span title="', '"'), "%Y-%m-%d %H:%M:%S"), "user" : extr('username: "', '"'), } text.nameext_from_url(image["url"], image) yield Message.Directory, image yield Message.Url, image["url"], image class CheveretoAlbumExtractor(CheveretoExtractor): """Extractor for chevereto Albums""" subcategory = "album" pattern = BASE_PATTERN + r"(/a(?:lbum)?/[^/?#]+(?:/sub)?)" example = "https://jpg2.su/album/TITLE.ID" def items(self): url = self.root + self.path data = {"_extractor": CheveretoImageExtractor} if self.path.endswith("/sub"): albums = self._pagination(url) else: albums = (url,) for album in albums: for image in self._pagination(album): yield Message.Queue, image, data class CheveretoUserExtractor(CheveretoExtractor): """Extractor for chevereto Users""" subcategory = "user" pattern = BASE_PATTERN + r"(/(?!img|image|a(?:lbum)?)[^/?#]+(?:/albums)?)" example = "https://jpg2.su/USER" def items(self): url = self.root + self.path if self.path.endswith("/albums"): data = {"_extractor": CheveretoAlbumExtractor} else: data = {"_extractor": CheveretoImageExtractor} for url in self._pagination(url): yield Message.Queue, url, data ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/cien.py������������������������������������������������������0000644�0001750�0001750�00000015522�15040344700�020032� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://ci-en.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?ci-en\.(?:net|dlsite\.com)" class CienExtractor(Extractor): category = "cien" root = "https://ci-en.net" request_interval = (1.0, 2.0) def __init__(self, match): self.root = text.root_from_url(match[0]) Extractor.__init__(self, match) def _init(self): self.cookies.set("accepted_rating", "r18g", domain="ci-en.dlsite.com") def _pagination_articles(self, url, params): data = {"_extractor": CienArticleExtractor} params["page"] = text.parse_int(params.get("page"), 1) while True: page = self.request(url, params=params).text for card in text.extract_iter( page, ' class="c-cardCase-item', '</div>'): article_url = text.extr(card, ' href="', '"') yield Message.Queue, article_url, data if ' rel="next"' not in page: return params["page"] += 1 class CienArticleExtractor(CienExtractor): subcategory = "article" filename_fmt = "{num:>02} {filename}.{extension}" directory_fmt = ("{category}", "{author[name]}", "{post_id} {name}") archive_fmt = "{post_id}_{num}" pattern = BASE_PATTERN + r"/creator/(\d+)/article/(\d+)" example = "https://ci-en.net/creator/123/article/12345" def items(self): url = f"{self.root}/creator/{self.groups[0]}/article/{self.groups[1]}" page = self.request(url, notfound="article").text files = self._extract_files(page) post = self._extract_jsonld(page)[0] post["post_url"] = url post["post_id"] = text.parse_int(self.groups[1]) post["count"] = len(files) post["date"] = text.parse_datetime(post["datePublished"]) try: del post["publisher"] del post["sameAs"] except Exception: pass yield Message.Directory, post for post["num"], file in enumerate(files, 1): post.update(file) if "extension" not in file: text.nameext_from_url(file["url"], post) yield Message.Url, file["url"], post def _extract_files(self, page): files = [] filetypes = self.config("files") if filetypes is None: self._extract_files_image(page, files) self._extract_files_video(page, files) self._extract_files_download(page, files) self._extract_files_gallery(page, files) else: generators = { "image" : self._extract_files_image, "video" : self._extract_files_video, "download": self._extract_files_download, "gallery" : self._extract_files_gallery, "gallerie": self._extract_files_gallery, } if isinstance(filetypes, str): filetypes = filetypes.split(",") for ft in filetypes: generators[ft.rstrip("s")](page, files) return files def _extract_files_image(self, page, files): for image in text.extract_iter( page, 'class="file-player-image"', "</figure>"): size = text.extr(image, ' data-size="', '"') w, _, h = size.partition("x") files.append({ "url" : text.extr(image, ' data-raw="', '"'), "width" : text.parse_int(w), "height": text.parse_int(h), "type" : "image", }) def _extract_files_video(self, page, files): for video in text.extract_iter( page, "<vue-file-player", "</vue-file-player>"): path = text.extr(video, ' base-path="', '"') name = text.extr(video, ' file-name="', '"') auth = text.extr(video, ' auth-key="', '"') file = text.nameext_from_url(name) file["url"] = f"{path}video-web.mp4?{auth}" file["type"] = "video" files.append(file) def _extract_files_download(self, page, files): for download in text.extract_iter( page, 'class="downloadBlock', "</div>"): name = text.extr(download, "<p>", "<") file = text.nameext_from_url(name.rpartition(" ")[0]) file["url"] = text.extr(download, ' href="', '"') file["type"] = "download" files.append(file) def _extract_files_gallery(self, page, files): for gallery in text.extract_iter( page, "<vue-image-gallery", "</vue-image-gallery>"): url = self.root + "/api/creator/gallery/images" params = { "hash" : text.extr(gallery, ' hash="', '"'), "gallery_id": text.extr(gallery, ' gallery-id="', '"'), "time" : text.extr(gallery, ' time="', '"'), } data = self.request_json(url, params=params) url = self.root + "/api/creator/gallery/imagePath" for params["page"], params["file_id"] in enumerate( data["imgList"]): path = self.request_json(url, params=params)["path"] file = params.copy() file["url"] = path files.append(file) class CienCreatorExtractor(CienExtractor): subcategory = "creator" pattern = BASE_PATTERN + r"/creator/(\d+)(?:/article(?:\?([^#]+))?)?/?$" example = "https://ci-en.net/creator/123" def items(self): url = f"{self.root}/creator/{self.groups[0]}/article" params = text.parse_query(self.groups[1]) params["mode"] = "list" return self._pagination_articles(url, params) class CienRecentExtractor(CienExtractor): subcategory = "recent" pattern = BASE_PATTERN + r"/mypage/recent(?:\?([^#]+))?" example = "https://ci-en.net/mypage/recent" def items(self): url = self.root + "/mypage/recent" params = text.parse_query(self.groups[0]) return self._pagination_articles(url, params) class CienFollowingExtractor(CienExtractor): subcategory = "following" pattern = BASE_PATTERN + r"/mypage/subscription(/following)?" example = "https://ci-en.net/mypage/subscription" def items(self): url = self.root + "/mypage/subscription" + (self.groups[0] or "") page = self.request(url).text data = {"_extractor": CienCreatorExtractor} for subscription in text.extract_iter( page, 'class="c-grid-subscriptionInfo', '</figure>'): url = text.extr(subscription, ' href="', '"') yield Message.Queue, url, data ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753381746.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/civitai.py���������������������������������������������������0000644�0001750�0001750�00000075404�15040475562�020564� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.civitai.com/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import memcache import itertools import time BASE_PATTERN = r"(?:https?://)?civitai\.com" USER_PATTERN = BASE_PATTERN + r"/user/([^/?#]+)" class CivitaiExtractor(Extractor): """Base class for civitai extractors""" category = "civitai" root = "https://civitai.com" directory_fmt = ("{category}", "{user[username]}", "images") filename_fmt = "{file[id]}.{extension}" archive_fmt = "{file[uuid]}" request_interval = (0.5, 1.5) def _init(self): if self.config("api") == "rest": self.log.debug("Using REST API") self.api = CivitaiRestAPI(self) else: self.log.debug("Using tRPC API") self.api = CivitaiTrpcAPI(self) if quality := self.config("quality"): if not isinstance(quality, str): quality = ",".join(quality) self._image_quality = quality self._image_ext = ("png" if quality == "original=true" else "jpg") else: self._image_quality = "original=true" self._image_ext = "png" if quality_video := self.config("quality-videos"): if not isinstance(quality_video, str): quality_video = ",".join(quality_video) if quality_video[0] == "+": quality_video = (self._image_quality + "," + quality_video.lstrip("+,")) self._video_quality = quality_video elif quality_video is not None and quality: self._video_quality = self._image_quality else: self._video_quality = "quality=100" self._video_ext = "webm" if metadata := self.config("metadata"): if isinstance(metadata, str): metadata = metadata.split(",") elif not isinstance(metadata, (list, tuple)): metadata = ("generation", "version", "post") self._meta_generation = ("generation" in metadata) self._meta_version = ("version" in metadata) self._meta_post = ("post" in metadata) else: self._meta_generation = self._meta_version = self._meta_post = \ False def items(self): if models := self.models(): data = {"_extractor": CivitaiModelExtractor} for model in models: url = f"{self.root}/models/{model['id']}" yield Message.Queue, url, data return if posts := self.posts(): for post in posts: if "images" in post: images = post["images"] else: images = self.api.images_post(post["id"]) post = self.api.post(post["id"]) post["date"] = text.parse_datetime( post["publishedAt"], "%Y-%m-%dT%H:%M:%S.%fZ") data = { "post": post, "user": post.pop("user"), } if self._meta_version: data["model"], data["version"] = \ self._extract_meta_version(post) yield Message.Directory, data for file in self._image_results(images): file.update(data) yield Message.Url, file["url"], file return if images := self.images(): for file in images: data = { "file": file, "user": file.pop("user"), } if self._meta_generation: data["generation"] = \ self._extract_meta_generation(file) if self._meta_version: data["model"], data["version"] = \ self._extract_meta_version(file, False) if "post" in file: data["post"] = file.pop("post") if self._meta_post and "post" not in data: data["post"] = post = self._extract_meta_post(file) if post: post.pop("user", None) file["date"] = text.parse_datetime( file["createdAt"], "%Y-%m-%dT%H:%M:%S.%fZ") data["url"] = url = self._url(file) text.nameext_from_url(url, data) if not data["extension"]: data["extension"] = ( self._video_ext if file.get("type") == "video" else self._image_ext) yield Message.Directory, data yield Message.Url, url, data return def models(self): return () def posts(self): return () def images(self): return () def _url(self, image): url = image["url"] video = image.get("type") == "video" quality = self._video_quality if video else self._image_quality if "/" in url: parts = url.rsplit("/", 3) image["uuid"] = parts[1] parts[2] = quality return "/".join(parts) image["uuid"] = url name = image.get("name") if not name: if mime := image.get("mimeType"): name = f"{image.get('id')}.{mime.rpartition('/')[2]}" else: ext = self._video_ext if video else self._image_ext name = f"{image.get('id')}.{ext}" return (f"https://image.civitai.com/xG1nkqKTMzGDvpLrqFT7WA" f"/{url}/{quality}/{name}") def _image_results(self, images): for num, file in enumerate(images, 1): data = text.nameext_from_url(file["url"], { "num" : num, "file": file, "url" : self._url(file), }) if not data["extension"]: data["extension"] = ( self._video_ext if file.get("type") == "video" else self._image_ext) if "id" not in file and data["filename"].isdecimal(): file["id"] = text.parse_int(data["filename"]) if "date" not in file: file["date"] = text.parse_datetime( file["createdAt"], "%Y-%m-%dT%H:%M:%S.%fZ") if self._meta_generation: file["generation"] = self._extract_meta_generation(file) yield data def _image_reactions(self): self._require_auth() params = self.params params["authed"] = True params["useIndex"] = False if "reactions" not in params: params["reactions"] = ("Like", "Dislike", "Heart", "Laugh", "Cry") return self.api.images(params) def _require_auth(self): if "Authorization" not in self.api.headers and \ not self.cookies.get( "__Secure-civitai-token", domain=".civitai.com"): raise exception.AuthRequired(("'api-key'", "cookies")) def _parse_query(self, value): return text.parse_query_list( value, {"tags", "reactions", "baseModels", "tools", "techniques", "types", "fileFormats"}) def _extract_meta_generation(self, image): try: return self.api.image_generationdata(image["id"]) except Exception as exc: return self.log.debug("", exc_info=exc) def _extract_meta_post(self, image): try: post = self.api.post(image["postId"]) post["date"] = text.parse_datetime( post["publishedAt"], "%Y-%m-%dT%H:%M:%S.%fZ") return post except Exception as exc: return self.log.debug("", exc_info=exc) def _extract_meta_version(self, item, is_post=True): try: if version_id := self._extract_version_id(item, is_post): version = self.api.model_version(version_id).copy() return version.pop("model", None), version except Exception as exc: self.log.debug("", exc_info=exc) return None, None def _extract_version_id(self, item, is_post=True): if version_id := item.get("modelVersionId"): return version_id if version_ids := item.get("modelVersionIds"): return version_ids[0] if version_ids := item.get("modelVersionIdsManual"): return version_ids[0] if is_post: return None item["post"] = post = self.api.post(item["postId"]) post.pop("user", None) return self._extract_version_id(post) class CivitaiModelExtractor(CivitaiExtractor): subcategory = "model" directory_fmt = ("{category}", "{user[username]}", "{model[id]}{model[name]:? //}", "{version[id]}{version[name]:? //}") pattern = BASE_PATTERN + r"/models/(\d+)(?:/?\?modelVersionId=(\d+))?" example = "https://civitai.com/models/12345/TITLE" def items(self): model_id, version_id = self.groups model = self.api.model(model_id) if "user" in model: user = model["user"] del model["user"] else: user = model["creator"] del model["creator"] versions = model["modelVersions"] del model["modelVersions"] if version_id: version_id = int(version_id) for version in versions: if version["id"] == version_id: break else: version = self.api.model_version(version_id) versions = (version,) for version in versions: version["date"] = text.parse_datetime( version["createdAt"], "%Y-%m-%dT%H:%M:%S.%fZ") data = { "model" : model, "version": version, "user" : user, } yield Message.Directory, data for file in self._extract_files(model, version, user): file.update(data) yield Message.Url, file["url"], file def _extract_files(self, model, version, user): filetypes = self.config("files") if filetypes is None: return self._extract_files_image(model, version, user) generators = { "model" : self._extract_files_model, "image" : self._extract_files_image, "gallery" : self._extract_files_gallery, "gallerie": self._extract_files_gallery, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return itertools.chain.from_iterable( generators[ft.rstrip("s")](model, version, user) for ft in filetypes ) def _extract_files_model(self, model, version, user): files = [] for num, file in enumerate(version["files"], 1): name, sep, ext = file["name"].rpartition(".") if not sep: name = ext ext = "bin" file["uuid"] = f"model-{model['id']}-{version['id']}-{file['id']}" files.append({ "num" : num, "file" : file, "filename" : name, "extension": ext, "url" : ( file.get("downloadUrl") or f"{self.root}/api/download/models/{version['id']}"), "_http_headers" : { "Authorization": self.api.headers.get("Authorization")}, "_http_validate": self._validate_file_model, }) return files def _extract_files_image(self, model, version, user): if "images" in version: images = version["images"] else: params = { "modelVersionId": version["id"], "prioritizedUserIds": (user["id"],), "period": "AllTime", "sort": "Most Reactions", "limit": 20, "pending": True, } images = self.api.images(params, defaults=False) return self._image_results(images) def _extract_files_gallery(self, model, version, user): images = self.api.images_gallery(model, version, user) return self._image_results(images) def _validate_file_model(self, response): if response.headers.get("Content-Type", "").startswith("text/html"): alert = text.extr( response.text, 'mantine-Alert-message">', "</div></div></div>") if alert: msg = f"\"{text.remove_html(alert)}\" - 'api-key' required" else: msg = "'api-key' required to download this file" self.log.warning(msg) return False return True class CivitaiImageExtractor(CivitaiExtractor): subcategory = "image" pattern = BASE_PATTERN + r"/images/(\d+)" example = "https://civitai.com/images/12345" def images(self): return self.api.image(self.groups[0]) class CivitaiPostExtractor(CivitaiExtractor): subcategory = "post" directory_fmt = ("{category}", "{username|user[username]}", "posts", "{post[id]}{post[title]:? //}") pattern = BASE_PATTERN + r"/posts/(\d+)" example = "https://civitai.com/posts/12345" def posts(self): return ({"id": int(self.groups[0])},) class CivitaiTagExtractor(CivitaiExtractor): subcategory = "tag" pattern = BASE_PATTERN + r"/tag/([^/?&#]+)" example = "https://civitai.com/tag/TAG" def models(self): tag = text.unquote(self.groups[0]) return self.api.models_tag(tag) class CivitaiSearchModelsExtractor(CivitaiExtractor): subcategory = "search-models" pattern = BASE_PATTERN + r"/search/models\?([^#]+)" example = "https://civitai.com/search/models?query=QUERY" def models(self): params = self._parse_query(self.groups[0]) return CivitaiSearchAPI(self).search_models( params.get("query"), params.get("sortBy"), self.api.nsfw) class CivitaiSearchImagesExtractor(CivitaiExtractor): subcategory = "search-images" pattern = BASE_PATTERN + r"/search/images\?([^#]+)" example = "https://civitai.com/search/images?query=QUERY" def images(self): params = self._parse_query(self.groups[0]) return CivitaiSearchAPI(self).search_images( params.get("query"), params.get("sortBy"), self.api.nsfw) class CivitaiModelsExtractor(CivitaiExtractor): subcategory = "models" pattern = BASE_PATTERN + r"/models(?:/?\?([^#]+))?(?:$|#)" example = "https://civitai.com/models" def models(self): params = self._parse_query(self.groups[0]) return self.api.models(params) class CivitaiImagesExtractor(CivitaiExtractor): subcategory = "images" pattern = BASE_PATTERN + r"/images(?:/?\?([^#]+))?(?:$|#)" example = "https://civitai.com/images" def images(self): params = self._parse_query(self.groups[0]) return self.api.images(params) class CivitaiPostsExtractor(CivitaiExtractor): subcategory = "posts" pattern = BASE_PATTERN + r"/posts(?:/?\?([^#]+))?(?:$|#)" example = "https://civitai.com/posts" def posts(self): params = self._parse_query(self.groups[0]) return self.api.posts(params) class CivitaiUserExtractor(Dispatch, CivitaiExtractor): pattern = USER_PATTERN + r"/?(?:$|\?|#)" example = "https://civitai.com/user/USER" def items(self): base = f"{self.root}/user/{self.groups[0]}/" return self._dispatch_extractors(( (CivitaiUserModelsExtractor, base + "models"), (CivitaiUserPostsExtractor , base + "posts"), (CivitaiUserImagesExtractor, base + "images"), (CivitaiUserVideosExtractor, base + "videos"), ), ("user-images", "user-videos")) class CivitaiUserModelsExtractor(CivitaiExtractor): subcategory = "user-models" pattern = USER_PATTERN + r"/models/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/models" def models(self): user, query = self.groups params = self._parse_query(query) params["username"] = text.unquote(user) return self.api.models(params) class CivitaiUserPostsExtractor(CivitaiExtractor): subcategory = "user-posts" directory_fmt = ("{category}", "{username|user[username]}", "posts", "{post[id]}{post[title]:? //}") pattern = USER_PATTERN + r"/posts/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/posts" def posts(self): user, query = self.groups params = self._parse_query(query) params["username"] = text.unquote(user) return self.api.posts(params) class CivitaiUserImagesExtractor(CivitaiExtractor): subcategory = "user-images" pattern = USER_PATTERN + r"/images/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/images" def __init__(self, match): user, query = match.groups() self.params = self._parse_query(query) if self.params.get("section") == "reactions": self.subcategory = "reactions-images" self.images = self._image_reactions else: self.params["username"] = text.unquote(user) CivitaiExtractor.__init__(self, match) def images(self): return self.api.images(self.params) class CivitaiUserVideosExtractor(CivitaiExtractor): subcategory = "user-videos" directory_fmt = ("{category}", "{username|user[username]}", "videos") pattern = USER_PATTERN + r"/videos/?(?:\?([^#]+))?" example = "https://civitai.com/user/USER/videos" def __init__(self, match): user, query = match.groups() self.params = self._parse_query(query) self.params["types"] = ("video",) if self.params.get("section") == "reactions": self.subcategory = "reactions-videos" self.images = self._image_reactions else: self.params["username"] = text.unquote(user) CivitaiExtractor.__init__(self, match) images = CivitaiUserImagesExtractor.images class CivitaiGeneratedExtractor(CivitaiExtractor): """Extractor for your generated files feed""" subcategory = "generated" filename_fmt = "{filename}.{extension}" directory_fmt = ("{category}", "generated") pattern = f"{BASE_PATTERN}/generate" example = "https://civitai.com/generate" def items(self): self._require_auth() for gen in self.api.orchestrator_queryGeneratedImages(): gen["date"] = text.parse_datetime( gen["createdAt"], "%Y-%m-%dT%H:%M:%S.%fZ") yield Message.Directory, gen for step in gen.pop("steps", ()): for image in step.pop("images", ()): data = {"file": image, **step, **gen} url = image["url"] yield Message.Url, url, text.nameext_from_url(url, data) class CivitaiRestAPI(): """Interface for the Civitai Public REST API https://developer.civitai.com/docs/api/public-rest """ def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api" self.headers = {"Content-Type": "application/json"} if api_key := extractor.config("api-key"): extractor.log.debug("Using api_key authentication") self.headers["Authorization"] = "Bearer " + api_key nsfw = extractor.config("nsfw") if nsfw is None or nsfw is True: nsfw = "X" elif not nsfw: nsfw = "Safe" self.nsfw = nsfw def image(self, image_id): return self.images({ "imageId": image_id, }) def images(self, params): endpoint = "/v1/images" if "nsfw" not in params: params["nsfw"] = self.nsfw return self._pagination(endpoint, params) def images_gallery(self, model, version, user): return self.images({ "modelId" : model["id"], "modelVersionId": version["id"], }) def model(self, model_id): endpoint = f"/v1/models/{model_id}" return self._call(endpoint) @memcache(keyarg=1) def model_version(self, model_version_id): endpoint = f"/v1/model-versions/{model_version_id}" return self._call(endpoint) def models(self, params): return self._pagination("/v1/models", params) def models_tag(self, tag): return self.models({"tag": tag}) def _call(self, endpoint, params=None): if endpoint[0] == "/": url = self.root + endpoint else: url = endpoint response = self.extractor.request( url, params=params, headers=self.headers) return response.json() def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) yield from data["items"] try: endpoint = data["metadata"]["nextPage"] except KeyError: return params = None class CivitaiTrpcAPI(): """Interface for the Civitai tRPC API""" def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api/trpc/" self.headers = { "content-type" : "application/json", "x-client-version": "5.0.920", "x-client-date" : "", "x-client" : "web", "x-fingerprint" : "undefined", } if api_key := extractor.config("api-key"): extractor.log.debug("Using api_key authentication") self.headers["Authorization"] = "Bearer " + api_key nsfw = extractor.config("nsfw") if nsfw is None or nsfw is True: nsfw = 31 elif not nsfw: nsfw = 1 self.nsfw = nsfw def image(self, image_id): endpoint = "image.get" params = {"id": int(image_id)} return (self._call(endpoint, params),) def image_generationdata(self, image_id): endpoint = "image.getGenerationData" params = {"id": int(image_id)} return self._call(endpoint, params) def images(self, params, defaults=True): endpoint = "image.getInfinite" if defaults: params = self._merge_params(params, { "useIndex" : True, "period" : "AllTime", "sort" : "Newest", "types" : ("image",), "withMeta" : False, # Metadata Only "fromPlatform" : False, # Made On-Site "browsingLevel": self.nsfw, "include" : ("cosmetics",), }) params = self._type_params(params) return self._pagination(endpoint, params) def images_gallery(self, model, version, user): endpoint = "image.getImagesAsPostsInfinite" params = { "period" : "AllTime", "sort" : "Newest", "modelVersionId": version["id"], "modelId" : model["id"], "hidden" : False, "limit" : 50, "browsingLevel" : self.nsfw, } for post in self._pagination(endpoint, params): yield from post["images"] def images_post(self, post_id): params = { "postId" : int(post_id), "pending": True, } return self.images(params) def model(self, model_id): endpoint = "model.getById" params = {"id": int(model_id)} return self._call(endpoint, params) @memcache(keyarg=1) def model_version(self, model_version_id): endpoint = "modelVersion.getById" params = {"id": int(model_version_id)} return self._call(endpoint, params) def models(self, params, defaults=True): endpoint = "model.getAll" if defaults: params = self._merge_params(params, { "period" : "AllTime", "periodMode" : "published", "sort" : "Newest", "pending" : False, "hidden" : False, "followed" : False, "earlyAccess" : False, "fromPlatform" : False, "supportsGeneration": False, "browsingLevel": self.nsfw, }) return self._pagination(endpoint, params) def models_tag(self, tag): return self.models({"tagname": tag}) def post(self, post_id): endpoint = "post.get" params = {"id": int(post_id)} return self._call(endpoint, params) def posts(self, params, defaults=True): endpoint = "post.getInfinite" meta = {"cursor": ("Date",)} if defaults: params = self._merge_params(params, { "browsingLevel": self.nsfw, "period" : "AllTime", "periodMode" : "published", "sort" : "Newest", "followed" : False, "draftOnly" : False, "pending" : True, "include" : ("cosmetics",), }) params = self._type_params(params) return self._pagination(endpoint, params, meta) def user(self, username): endpoint = "user.getCreator" params = {"username": username} return (self._call(endpoint, params),) def orchestrator_queryGeneratedImages(self): endpoint = "orchestrator.queryGeneratedImages" params = { "ascending": False, "tags" : ("gen",), "authed" : True, } return self._pagination(endpoint, params) def _call(self, endpoint, params, meta=None): url = self.root + endpoint headers = self.headers if meta: input = {"json": params, "meta": {"values": meta}} else: input = {"json": params} params = {"input": util.json_dumps(input)} headers["x-client-date"] = str(int(time.time() * 1000)) response = self.extractor.request(url, params=params, headers=headers) return response.json()["result"]["data"]["json"] def _pagination(self, endpoint, params, meta=None): if "cursor" not in params: params["cursor"] = None meta_ = {"cursor": ("undefined",)} while True: data = self._call(endpoint, params, meta_) yield from data["items"] try: if not data["nextCursor"]: return except KeyError: return params["cursor"] = data["nextCursor"] meta_ = meta def _merge_params(self, params_user, params_default): """Combine 'params_user' with 'params_default'""" params_default.update(params_user) return params_default def _type_params(self, params): """Convert 'params' values to expected types""" types = { "tags" : int, "tools" : int, "techniques" : int, "modelId" : int, "modelVersionId": int, "remixesOnly" : _bool, "nonRemixesOnly": _bool, "withMeta" : _bool, "fromPlatform" : _bool, "supportsGeneration": _bool, } for name, value in params.items(): if name not in types: continue elif isinstance(value, str): params[name] = types[name](value) elif isinstance(value, list): type = types[name] params[name] = [type(item) for item in value] return params def _bool(value): return value == "true" class CivitaiSearchAPI(): def __init__(self, extractor): self.extractor = extractor self.root = "https://search.civitai.com" self.headers = { "Authorization": "Bearer ab8565e5ab8dc2d8f0d4256d204781cb63fe8b031" "eb3779cbbed38a7b5308e5c", "Content-Type": "application/json", "X-Meilisearch-Client": "Meilisearch instant-meilisearch (v0.13.5)" " ; Meilisearch JavaScript (v0.34.0)", "Origin": extractor.root, "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-site", "Priority": "u=4", } def search(self, query, type, facets, nsfw=31): endpoint = "/multi-search" query = { "q" : query, "indexUid": type, "facets" : facets, "attributesToHighlight": (), "highlightPreTag" : "__ais-highlight__", "highlightPostTag": "__/ais-highlight__", "limit" : 51, "offset": 0, "filter": (self._generate_filter(nsfw),), } return self._pagination(endpoint, query) def search_models(self, query, type=None, nsfw=31): facets = ( "category.name", "checkpointType", "fileFormats", "lastVersionAtUnix", "tags.name", "type", "user.username", "version.baseModel", ) return self.search(query, type or "models_v9", facets, nsfw) def search_images(self, query, type=None, nsfw=31): facets = ( "aspectRatio", "baseModel", "createdAtUnix", "tagNames", "techniqueNames", "toolNames", "type", "user.username", ) return self.search(query, type or "images_v6", facets, nsfw) def _call(self, endpoint, query): url = self.root + endpoint params = util.json_dumps({"queries": (query,)}) data = self.extractor.request_json( url, method="POST", headers=self.headers, data=params) return data["results"][0] def _pagination(self, endpoint, query): limit = query["limit"] - 1 threshold = limit // 2 while True: data = self._call(endpoint, query) items = data["hits"] yield from items if len(items) < threshold: return query["offset"] += limit def _generate_filter(self, level): fltr = [] if level & 1: fltr.append("1") if level & 2: fltr.append("2") if level & 4: fltr.append("4") if level & 8: fltr.append("8") if level & 16: fltr.append("16") if not fltr: return "()" return "(nsfwLevel=" + " OR nsfwLevel=".join(fltr) + ")" ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/comick.py����������������������������������������������������0000644�0001750�0001750�00000015162�15040344700�020361� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comick.io/""" from .common import ChapterExtractor, MangaExtractor, Message from .. import text from ..cache import memcache BASE_PATTERN = r"(?:https?://)?(?:www\.)?comick\.io" class ComickBase(): """Base class for comick.io extractors""" category = "comick" root = "https://comick.io" @memcache(keyarg=1) def _manga_info(self, slug): url = f"{self.root}/comic/{slug}" page = self.request(url).text data = self._extract_nextdata(page) props = data["props"]["pageProps"] comic = props["comic"] genre = [] theme = [] format = "" for item in comic["md_comic_md_genres"]: item = item["md_genres"] group = item["group"] if group == "Genre": genre.append(item["name"]) elif group == "Theme": theme.append(item["name"]) else: format = item["name"] if mu := comic["mu_comics"]: tags = [c["mu_categories"]["title"] for c in mu["mu_comic_categories"]] publisher = [p["mu_publishers"]["title"] for p in mu["mu_comic_publishers"]] else: tags = publisher = () return { "manga": comic["title"], "manga_id": comic["id"], "manga_hid": comic["hid"], "manga_slug": slug, "manga_titles": [t["title"] for t in comic["md_titles"]], "artist": [a["name"] for a in props["artists"]], "author": [a["name"] for a in props["authors"]], "genre" : genre, "theme" : theme, "format": format, "tags" : tags, "publisher": publisher, "published": text.parse_int(comic["year"]), "description": comic["desc"], "demographic": props["demographic"], "origin": comic["iso639_1"], "mature": props["matureContent"], "rating": comic["content_rating"], "rank" : comic["follow_rank"], "score" : text.parse_float(comic["bayesian_rating"]), "status": "Complete" if comic["status"] == 2 else "Ongoing", "links" : comic["links"], "_build_id": data["buildId"], } def _chapter_info(self, manga, chstr): slug = manga['manga_slug'] url = (f"{self.root}/_next/data/{manga['_build_id']}" f"/comic/{slug}/{chstr}.json") params = {"slug": slug, "chapter": chstr} return self.request_json(url, params=params)["pageProps"] class ComickChapterExtractor(ComickBase, ChapterExtractor): """Extractor for comick.io manga chapters""" archive_fmt = "{chapter_hid}_{page}" pattern = BASE_PATTERN + r"/comic/([\w-]+)/(\w+-chapter-[^/?#]+)" example = "https://comick.io/comic/MANGA/ID-chapter-123-en" def metadata(self, page): slug, chstr = self.groups manga = self._manga_info(slug) props = self._chapter_info(manga, chstr) ch = props["chapter"] self._images = ch["md_images"] chapter, sep, minor = ch["chap"].partition(".") return { **manga, "title" : props["chapTitle"], "volume" : text.parse_int(ch["vol"]), "chapter" : text.parse_int(chapter), "chapter_minor" : sep + minor, "chapter_id" : ch["id"], "chapter_hid" : ch["hid"], "chapter_string": chstr, "group" : ch["group_name"], "date" : text.parse_datetime( ch["created_at"][:19], "%Y-%m-%dT%H:%M:%S"), "date_updated" : text.parse_datetime( ch["updated_at"][:19], "%Y-%m-%dT%H:%M:%S"), "lang" : ch["lang"], } def images(self, page): return [ ("https://meo.comick.pictures/" + img["b2key"], { "width" : img["w"], "height" : img["h"], "size" : img["s"], "optimized": img["optimized"], }) for img in self._images ] class ComickMangaExtractor(ComickBase, MangaExtractor): """Extractor for comick.io manga""" pattern = BASE_PATTERN + r"/comic/([\w-]+)/?(?:\?([^#]+))?" example = "https://comick.io/comic/MANGA" def items(self): slug = self.groups[0] manga = self._manga_info(slug) for ch in self.chapters(manga): url = (f"{self.root}/comic/{slug}" f"/{ch['hid']}-chapter-{ch['chap']}-{ch['lang']}") ch.update(manga) chapter, sep, minor = ch["chap"].partition(".") ch["chapter"] = text.parse_int(chapter) ch["chapter_minor"] = sep + minor ch["_extractor"] = ComickChapterExtractor yield Message.Queue, url, ch def chapters(self, manga): info = True slug, query = self.groups url = f"https://api.comick.io/comic/{manga['manga_hid']}/chapters" headers = { "Origin": "https://comick.io", "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-site", } query = text.parse_query(query) params = {"lang": query.get("lang") or None} params["page"] = page = text.parse_int(query.get("page"), 1) if date_order := query.get("date-order"): params["date-order"] = date_order elif chap_order := query.get("chap-order"): params["chap-order"] = chap_order else: params["chap-order"] = \ "0" if self.config("chapter-reverse", False) else "1" group = query.get("group", None) if group == "0": group = None while True: data = self.request_json(url, params=params, headers=headers) limit = data["limit"] if info: info = False total = data["total"] - limit * page if total > limit: self.log.info("Collecting %s chapters", total) if group is None: yield from data["chapters"] else: for ch in data["chapters"]: if group in ch["group_name"]: yield ch if data["total"] <= limit * page: return params["page"] = page = page + 1 ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/comicvine.py�������������������������������������������������0000644�0001750�0001750�00000003771�15040344700�021073� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://comicvine.gamespot.com/""" from .booru import BooruExtractor from .. import text import operator class ComicvineTagExtractor(BooruExtractor): """Extractor for a gallery on comicvine.gamespot.com""" category = "comicvine" subcategory = "tag" basecategory = "" root = "https://comicvine.gamespot.com" per_page = 1000 directory_fmt = ("{category}", "{tag}") filename_fmt = "{filename}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?comicvine\.gamespot\.com" r"(/([^/?#]+)/(\d+-\d+)/images/.*)") example = "https://comicvine.gamespot.com/TAG/123-45/images/" def __init__(self, match): BooruExtractor.__init__(self, match) self.path, self.object_name, self.object_id = match.groups() def metadata(self): return {"tag": text.unquote(self.object_name)} def posts(self): url = self.root + "/js/image-data.json" params = { "images": text.extract( self.request(self.root + self.path).text, 'data-gallery-id="', '"')[0], "start" : self.page_start, "count" : self.per_page, "object": self.object_id, } while True: images = self.request_json(url, params=params)["images"] yield from images if len(images) < self.per_page: return params["start"] += self.per_page def skip(self, num): self.page_start = num return num _file_url = operator.itemgetter("original") def _prepare(self, post): post["date"] = text.parse_datetime( post["dateCreated"], "%a, %b %d %Y") post["tags"] = [tag["name"] for tag in post["tags"] if tag["name"]] �������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753459828.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/common.py����������������������������������������������������0000644�0001750�0001750�00000117235�15040726164�020420� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Common classes and constants used by extractor modules.""" import os import re import ssl import time import netrc import queue import random import getpass import logging import requests import threading from datetime import datetime from xml.etree import ElementTree from requests.adapters import HTTPAdapter from .message import Message from .. import config, output, text, util, cache, exception urllib3 = requests.packages.urllib3 class Extractor(): category = "" subcategory = "" basecategory = "" categorytransfer = False directory_fmt = ("{category}",) filename_fmt = "{filename}.{extension}" archive_fmt = "" status = 0 root = "" cookies_domain = "" cookies_index = 0 referer = True ciphers = None tls12 = True browser = None useragent = util.USERAGENT_FIREFOX request_interval = 0.0 request_interval_min = 0.0 request_interval_429 = 60.0 request_timestamp = 0.0 def __init__(self, match): self.log = logging.getLogger(self.category) self.url = match.string self.match = match self.groups = match.groups() self.kwdict = {} if self.category in CATEGORY_MAP: catsub = f"{self.category}:{self.subcategory}" if catsub in CATEGORY_MAP: self.category, self.subcategory = CATEGORY_MAP[catsub] else: self.category = CATEGORY_MAP[self.category] self._cfgpath = ("extractor", self.category, self.subcategory) self._parentdir = "" @classmethod def from_url(cls, url): if isinstance(cls.pattern, str): cls.pattern = util.re_compile(cls.pattern) match = cls.pattern.match(url) return cls(match) if match else None def __iter__(self): self.initialize() return self.items() def initialize(self): self._init_options() self._init_session() self._init_cookies() self._init() self.initialize = util.noop def finalize(self): pass def items(self): yield Message.Version, 1 def skip(self, num): return 0 def config(self, key, default=None): return config.interpolate(self._cfgpath, key, default) def config2(self, key, key2, default=None, sentinel=util.SENTINEL): value = self.config(key, sentinel) if value is not sentinel: return value return self.config(key2, default) def config_deprecated(self, key, deprecated, default=None, sentinel=util.SENTINEL, history=set()): value = self.config(deprecated, sentinel) if value is not sentinel: if deprecated not in history: history.add(deprecated) self.log.warning("'%s' is deprecated. Use '%s' instead.", deprecated, key) default = value value = self.config(key, sentinel) if value is not sentinel: return value return default def config_accumulate(self, key): return config.accumulate(self._cfgpath, key) def config_instance(self, key, default=None): return default def _config_shared(self, key, default=None): return config.interpolate_common( ("extractor",), self._cfgpath, key, default) def _config_shared_accumulate(self, key): first = True extr = ("extractor",) for path in self._cfgpath: if first: first = False values = config.accumulate(extr + path, key) elif conf := config.get(extr, path[0]): values[:0] = config.accumulate( (self.subcategory,), key, conf=conf) return values def request(self, url, method="GET", session=None, retries=None, retry_codes=None, encoding=None, fatal=True, notfound=None, **kwargs): if session is None: session = self.session if retries is None: retries = self._retries if retry_codes is None: retry_codes = self._retry_codes if "proxies" not in kwargs: kwargs["proxies"] = self._proxies if "timeout" not in kwargs: kwargs["timeout"] = self._timeout if "verify" not in kwargs: kwargs["verify"] = self._verify if "json" in kwargs: if (json := kwargs["json"]) is not None: kwargs["data"] = util.json_dumps(json).encode() del kwargs["json"] if headers := kwargs.get("headers"): headers["Content-Type"] = "application/json" else: kwargs["headers"] = {"Content-Type": "application/json"} response = challenge = None tries = 1 if self._interval: seconds = (self._interval() - (time.time() - Extractor.request_timestamp)) if seconds > 0.0: self.sleep(seconds, "request") while True: try: response = session.request(method, url, **kwargs) except requests.exceptions.ConnectionError as exc: try: reason = exc.args[0].reason cls = reason.__class__.__name__ pre, _, err = str(reason.args[-1]).partition(":") msg = f" {cls}: {(err or pre).lstrip()}" except Exception: msg = exc code = 0 except (requests.exceptions.Timeout, requests.exceptions.ChunkedEncodingError, requests.exceptions.ContentDecodingError) as exc: msg = exc code = 0 except (requests.exceptions.RequestException) as exc: msg = exc break else: code = response.status_code if self._write_pages: self._dump_response(response) if ( code < 400 or code < 500 and ( not fatal and code != 429 or fatal is None) or fatal is ... ): if encoding: response.encoding = encoding return response if notfound and code == 404: self.status |= exception.NotFoundError.code raise exception.NotFoundError(notfound) msg = f"'{code} {response.reason}' for '{response.url}'" challenge = util.detect_challenge(response) if challenge is not None: self.log.warning(challenge) if code == 429 and self._handle_429(response): continue elif code == 429 and self._interval_429: pass elif code not in retry_codes and code < 500: break finally: Extractor.request_timestamp = time.time() self.log.debug("%s (%s/%s)", msg, tries, retries+1) if tries > retries: break seconds = tries if self._interval: s = self._interval() if seconds < s: seconds = s if code == 429 and self._interval_429: s = self._interval_429() if seconds < s: seconds = s self.wait(seconds=seconds, reason="429 Too Many Requests") else: self.sleep(seconds, "retry") tries += 1 if not fatal or fatal is ...: self.log.warning(msg) return util.NullResponse(url, msg) if challenge is None: exc = exception.HttpError(msg, response) else: exc = exception.ChallengeError(challenge, response) self.status |= exc.code raise exc def request_location(self, url, **kwargs): kwargs.setdefault("method", "HEAD") kwargs.setdefault("allow_redirects", False) return self.request(url, **kwargs).headers.get("location", "") def request_json(self, url, **kwargs): response = self.request(url, **kwargs) try: return util.json_loads(response.text) except Exception as exc: fatal = kwargs.get("fatal", True) if not fatal or fatal is ...: if challenge := util.detect_challenge(response): self.log.warning(challenge) else: self.log.warning("%s: %s", exc.__class__.__name__, exc) return {} raise def request_xml(self, url, xmlns=True, **kwargs): response = self.request(url, **kwargs) if xmlns: text = response.text else: text = response.text.replace(" xmlns=", " ns=") parser = ElementTree.XMLParser() try: parser.feed(text) return parser.close() except Exception as exc: fatal = kwargs.get("fatal", True) if not fatal or fatal is ...: if challenge := util.detect_challenge(response): self.log.warning(challenge) else: self.log.warning("%s: %s", exc.__class__.__name__, exc) return ElementTree.Element("") raise _handle_429 = util.false def wait(self, seconds=None, until=None, adjust=1.0, reason="rate limit"): now = time.time() if seconds: seconds = float(seconds) until = now + seconds elif until: if isinstance(until, datetime): # convert to UTC timestamp until = util.datetime_to_timestamp(until) else: until = float(until) seconds = until - now else: raise ValueError("Either 'seconds' or 'until' is required") seconds += adjust if seconds <= 0.0: return if reason: t = datetime.fromtimestamp(until).time() isotime = f"{t.hour:02}:{t.minute:02}:{t.second:02}" self.log.info("Waiting until %s (%s)", isotime, reason) time.sleep(seconds) def sleep(self, seconds, reason): self.log.debug("Sleeping %.2f seconds (%s)", seconds, reason) time.sleep(seconds) def input(self, prompt, echo=True): self._check_input_allowed(prompt) if echo: try: return input(prompt) except (EOFError, OSError): return None else: return getpass.getpass(prompt) def _check_input_allowed(self, prompt=""): input = self.config("input") if input is None: input = output.TTY_STDIN if not input: raise exception.AbortExtraction( f"User input required ({prompt.strip(' :')})") def _get_auth_info(self): """Return authentication information as (username, password) tuple""" username = self.config("username") password = None if username: password = self.config("password") if not password: self._check_input_allowed("password") password = util.LazyPrompt() elif self.config("netrc", False): try: info = netrc.netrc().authenticators(self.category) username, _, password = info except (OSError, netrc.NetrcParseError) as exc: self.log.error("netrc: %s", exc) except TypeError: self.log.warning("netrc: No authentication info") return username, password def _init(self): pass def _init_options(self): self._write_pages = self.config("write-pages", False) self._retry_codes = self.config("retry-codes") self._retries = self.config("retries", 4) self._timeout = self.config("timeout", 30) self._verify = self.config("verify", True) self._proxies = util.build_proxy_map(self.config("proxy"), self.log) self._interval = util.build_duration_func( self.config("sleep-request", self.request_interval), self.request_interval_min, ) self._interval_429 = util.build_duration_func( self.config("sleep-429", self.request_interval_429), ) if self._retries < 0: self._retries = float("inf") if not self._retry_codes: self._retry_codes = () def _init_session(self): self.session = session = requests.Session() headers = session.headers headers.clear() ssl_options = ssl_ciphers = 0 # .netrc Authorization headers are alwsays disabled session.trust_env = True if self.config("proxy-env", True) else False browser = self.config("browser") if browser is None: browser = self.browser if browser and isinstance(browser, str): browser, _, platform = browser.lower().partition(":") if not platform or platform == "auto": platform = ("Windows NT 10.0; Win64; x64" if util.WINDOWS else "X11; Linux x86_64") elif platform == "windows": platform = "Windows NT 10.0; Win64; x64" elif platform == "linux": platform = "X11; Linux x86_64" elif platform == "macos": platform = "Macintosh; Intel Mac OS X 15.5" if browser == "chrome": if platform.startswith("Macintosh"): platform = platform.replace(".", "_") else: browser = "firefox" for key, value in HEADERS[browser]: if value and "{}" in value: headers[key] = value.replace("{}", platform) else: headers[key] = value ssl_options |= (ssl.OP_NO_SSLv2 | ssl.OP_NO_SSLv3 | ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1) ssl_ciphers = CIPHERS[browser] else: headers["User-Agent"] = self.useragent headers["Accept"] = "*/*" headers["Accept-Language"] = "en-US,en;q=0.5" ssl_ciphers = self.ciphers if ssl_ciphers is not None and ssl_ciphers in CIPHERS: ssl_ciphers = CIPHERS[ssl_ciphers] if BROTLI: headers["Accept-Encoding"] = "gzip, deflate, br" else: headers["Accept-Encoding"] = "gzip, deflate" if ZSTD: headers["Accept-Encoding"] += ", zstd" if referer := self.config("referer", self.referer): if isinstance(referer, str): headers["Referer"] = referer elif self.root: headers["Referer"] = self.root + "/" custom_ua = self.config("user-agent") if custom_ua is None or custom_ua == "auto": pass elif custom_ua == "browser": headers["User-Agent"] = _browser_useragent() elif self.useragent is Extractor.useragent and not self.browser or \ custom_ua is not config.get(("extractor",), "user-agent"): headers["User-Agent"] = custom_ua if custom_headers := self.config("headers"): if isinstance(custom_headers, str): if custom_headers in HEADERS: custom_headers = HEADERS[custom_headers] else: self.log.error("Invalid 'headers' value '%s'", custom_headers) custom_headers = () headers.update(custom_headers) if custom_ciphers := self.config("ciphers"): if isinstance(custom_ciphers, list): ssl_ciphers = ":".join(custom_ciphers) elif custom_ciphers in CIPHERS: ssl_ciphers = CIPHERS[custom_ciphers] else: ssl_ciphers = custom_ciphers if source_address := self.config("source-address"): if isinstance(source_address, str): source_address = (source_address, 0) else: source_address = (source_address[0], source_address[1]) tls12 = self.config("tls12") if tls12 is None: tls12 = self.tls12 if not tls12: ssl_options |= ssl.OP_NO_TLSv1_2 self.log.debug("TLS 1.2 disabled.") if self.config("truststore"): try: from truststore import SSLContext as ssl_ctx except ImportError as exc: self.log.error("%s: %s", exc.__class__.__name__, exc) ssl_ctx = None else: ssl_ctx = None adapter = _build_requests_adapter( ssl_options, ssl_ciphers, ssl_ctx, source_address) session.mount("https://", adapter) session.mount("http://", adapter) def _init_cookies(self): """Populate the session's cookiejar""" self.cookies = self.session.cookies self.cookies_file = None if self.cookies_domain is None: return if cookies := self.config("cookies"): if select := self.config("cookies-select"): if select == "rotate": cookies = cookies[self.cookies_index % len(cookies)] Extractor.cookies_index += 1 else: cookies = random.choice(cookies) self.cookies_load(cookies) def cookies_load(self, cookies_source): if isinstance(cookies_source, dict): self.cookies_update_dict(cookies_source, self.cookies_domain) elif isinstance(cookies_source, str): path = util.expand_path(cookies_source) try: with open(path) as fp: cookies = util.cookiestxt_load(fp) except Exception as exc: self.log.warning("cookies: Failed to load '%s' (%s: %s)", cookies_source, exc.__class__.__name__, exc) else: self.log.debug("cookies: Loading cookies from '%s'", cookies_source) set_cookie = self.cookies.set_cookie for cookie in cookies: set_cookie(cookie) self.cookies_file = path elif isinstance(cookies_source, (list, tuple)): key = tuple(cookies_source) cookies = CACHE_COOKIES.get(key) if cookies is None: from ..cookies import load_cookies try: cookies = load_cookies(cookies_source) except Exception as exc: self.log.warning("cookies: %s", exc) cookies = () else: CACHE_COOKIES[key] = cookies else: self.log.debug("cookies: Using cached cookies from %s", key) set_cookie = self.cookies.set_cookie for cookie in cookies: set_cookie(cookie) else: self.log.error( "cookies: Expected 'dict', 'list', or 'str' value for " "'cookies' option, got '%s' instead (%r)", cookies_source.__class__.__name__, cookies_source) def cookies_store(self): """Store the session's cookies in a cookies.txt file""" export = self.config("cookies-update", True) if not export: return if isinstance(export, str): path = util.expand_path(export) else: path = self.cookies_file if not path: return path_tmp = path + ".tmp" try: with open(path_tmp, "w") as fp: util.cookiestxt_store(fp, self.cookies) os.replace(path_tmp, path) except OSError as exc: self.log.error("cookies: Failed to write to '%s' " "(%s: %s)", path, exc.__class__.__name__, exc) def cookies_update(self, cookies, domain=""): """Update the session's cookiejar with 'cookies'""" if isinstance(cookies, dict): self.cookies_update_dict(cookies, domain or self.cookies_domain) else: set_cookie = self.cookies.set_cookie try: cookies = iter(cookies) except TypeError: set_cookie(cookies) else: for cookie in cookies: set_cookie(cookie) def cookies_update_dict(self, cookiedict, domain): """Update cookiejar with name-value pairs from a dict""" set_cookie = self.cookies.set for name, value in cookiedict.items(): set_cookie(name, value, domain=domain) def cookies_check(self, cookies_names, domain=None, subdomains=False): """Check if all 'cookies_names' are in the session's cookiejar""" if not self.cookies: return False if domain is None: domain = self.cookies_domain names = set(cookies_names) now = time.time() for cookie in self.cookies: if cookie.name not in names: continue if not domain or cookie.domain == domain: pass elif not subdomains or not cookie.domain.endswith(domain): continue if cookie.expires: diff = int(cookie.expires - now) if diff <= 0: self.log.warning( "cookies: %s/%s expired at %s", cookie.domain.lstrip("."), cookie.name, datetime.fromtimestamp(cookie.expires)) continue elif diff <= 86400: hours = diff // 3600 self.log.warning( "cookies: %s/%s will expire in less than %s hour%s", cookie.domain.lstrip("."), cookie.name, hours + 1, "s" if hours else "") names.discard(cookie.name) if not names: return True return False def _extract_jsonld(self, page): return util.json_loads(text.extr( page, '<script type="application/ld+json">', "</script>")) def _extract_nextdata(self, page): return util.json_loads(text.extr( page, ' id="__NEXT_DATA__" type="application/json">', "</script>")) def _cache(self, func, maxage, keyarg=None): # return cache.DatabaseCacheDecorator(func, maxage, keyarg) return cache.DatabaseCacheDecorator(func, keyarg, maxage) def _cache_memory(self, func, maxage=None, keyarg=None): return cache.Memcache() def _get_date_min_max(self, dmin=None, dmax=None): """Retrieve and parse 'date-min' and 'date-max' config values""" def get(key, default): ts = self.config(key, default) if isinstance(ts, str): try: ts = int(datetime.strptime(ts, fmt).timestamp()) except ValueError as exc: self.log.warning("Unable to parse '%s': %s", key, exc) ts = default return ts fmt = self.config("date-format", "%Y-%m-%dT%H:%M:%S") return get("date-min", dmin), get("date-max", dmax) @classmethod def _dump(cls, obj): util.dump_json(obj, ensure_ascii=False, indent=2) def _dump_response(self, response, history=True): """Write the response content to a .txt file in the current directory. The file name is derived from the response url, replacing special characters with "_" """ if history: for resp in response.history: self._dump_response(resp, False) if hasattr(Extractor, "_dump_index"): Extractor._dump_index += 1 else: Extractor._dump_index = 1 Extractor._dump_sanitize = util.re_compile( r"[\\\\|/<>:\"?*&=#]+").sub fname = (f"{Extractor._dump_index:>02}_" f"{Extractor._dump_sanitize('_', response.url)}") if util.WINDOWS: path = os.path.abspath(fname)[:255] else: path = fname[:251] try: with open(path + ".txt", 'wb') as fp: util.dump_response( response, fp, headers=(self._write_pages in ("all", "ALL")), hide_auth=(self._write_pages != "ALL") ) self.log.info("Writing '%s' response to '%s'", response.url, path + ".txt") except Exception as e: self.log.warning("Failed to dump HTTP request (%s: %s)", e.__class__.__name__, e) class GalleryExtractor(Extractor): subcategory = "gallery" filename_fmt = "{category}_{gallery_id}_{num:>03}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") archive_fmt = "{gallery_id}_{num}" enum = "num" def __init__(self, match, url=None): Extractor.__init__(self, match) if url is None and (path := self.groups[0]) and path[0] == "/": self.page_url = f"{self.root}{path}" else: self.page_url = url def items(self): self.login() if self.page_url: page = self.request( self.page_url, notfound=self.subcategory).text else: page = None data = self.metadata(page) imgs = self.images(page) assets = self.assets(page) if "count" in data: if self.config("page-reverse"): images = util.enumerate_reversed(imgs, 1, data["count"]) else: images = zip( range(1, data["count"]+1), imgs, ) else: enum = enumerate try: data["count"] = len(imgs) except TypeError: pass else: if self.config("page-reverse"): enum = util.enumerate_reversed images = enum(imgs, 1) yield Message.Directory, data enum_key = self.enum if assets: for asset in assets: url = asset["url"] asset.update(data) asset[enum_key] = 0 if "extension" not in asset: text.nameext_from_url(url, asset) yield Message.Url, url, asset for data[enum_key], (url, imgdata) in images: if imgdata: data.update(imgdata) if "extension" not in imgdata: text.nameext_from_url(url, data) else: text.nameext_from_url(url, data) yield Message.Url, url, data def login(self): """Login and set necessary cookies""" def metadata(self, page): """Return a dict with general metadata""" def images(self, page): """Return a list or iterable of all (image-url, metadata)-tuples""" def assets(self, page): """Return an iterable of additional gallery assets Each asset must be a 'dict' containing at least 'url' and 'type' """ class ChapterExtractor(GalleryExtractor): subcategory = "chapter" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor:?//}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = ( "{manga}_{chapter}{chapter_minor}_{page}") enum = "page" class MangaExtractor(Extractor): subcategory = "manga" categorytransfer = True chapterclass = None reverse = True def __init__(self, match, url=None): Extractor.__init__(self, match) if url is None and (path := self.groups[0]) and path[0] == "/": self.page_url = f"{self.root}{path}" else: self.page_url = url if self.config("chapter-reverse", False): self.reverse = not self.reverse def items(self): self.login() if self.page_url: page = self.request(self.page_url, notfound=self.subcategory).text else: page = None chapters = self.chapters(page) if self.reverse: chapters.reverse() for chapter, data in chapters: data["_extractor"] = self.chapterclass yield Message.Queue, chapter, data def login(self): """Login and set necessary cookies""" def chapters(self, page): """Return a list of all (chapter-url, metadata)-tuples""" class Dispatch(): subcategory = "user" cookies_domain = None finalize = Extractor.finalize skip = Extractor.skip def __iter__(self): return self.items() def initialize(self): pass def _dispatch_extractors(self, extractor_data, default=(), alt=None): extractors = { data[0].subcategory: data for data in extractor_data } if alt is not None: for sub, sub_alt in alt: extractors[sub_alt] = extractors[sub] include = self.config("include", default) or () if include == "all": include = extractors elif isinstance(include, str): include = include.replace(" ", "").split(",") results = [(Message.Version, 1)] for category in include: try: extr, url = extractors[category] except KeyError: self.log.warning("Invalid include '%s'", category) else: results.append((Message.Queue, url, {"_extractor": extr})) return iter(results) class AsynchronousMixin(): """Run info extraction in a separate thread""" def __iter__(self): self.initialize() messages = queue.Queue(5) thread = threading.Thread( target=self.async_items, args=(messages,), daemon=True, ) thread.start() while True: msg = messages.get() if msg is None: thread.join() return if isinstance(msg, Exception): thread.join() raise msg yield msg messages.task_done() def async_items(self, messages): try: for msg in self.items(): messages.put(msg) except Exception as exc: messages.put(exc) messages.put(None) class BaseExtractor(Extractor): instances = () def __init__(self, match): if not self.category: self.groups = match.groups() self.match = match self._init_category() Extractor.__init__(self, match) def _init_category(self): for index, group in enumerate(self.groups): if group is not None: if index: self.category, self.root, info = self.instances[index-1] if not self.root: self.root = text.root_from_url(self.match[0]) self.config_instance = info.get else: self.root = group self.category = group.partition("://")[2] break @classmethod def update(cls, instances): if extra_instances := config.get(("extractor",), cls.basecategory): for category, info in extra_instances.items(): if isinstance(info, dict) and "root" in info: instances[category] = info pattern_list = [] instance_list = cls.instances = [] for category, info in instances.items(): if root := info["root"]: root = root.rstrip("/") instance_list.append((category, root, info)) pattern = info.get("pattern") if not pattern: pattern = re.escape(root[root.index(":") + 3:]) pattern_list.append(pattern + "()") return ( r"(?:" + cls.basecategory + r":(https?://[^/?#]+)|" r"(?:https?://)?(?:" + "|".join(pattern_list) + r"))" ) class RequestsAdapter(HTTPAdapter): def __init__(self, ssl_context=None, source_address=None): self.ssl_context = ssl_context self.source_address = source_address HTTPAdapter.__init__(self) def init_poolmanager(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.init_poolmanager(self, *args, **kwargs) def proxy_manager_for(self, *args, **kwargs): kwargs["ssl_context"] = self.ssl_context kwargs["source_address"] = self.source_address return HTTPAdapter.proxy_manager_for(self, *args, **kwargs) def _build_requests_adapter( ssl_options, ssl_ciphers, ssl_ctx, source_address): key = (ssl_options, ssl_ciphers, ssl_ctx, source_address) try: return CACHE_ADAPTERS[key] except KeyError: pass if ssl_options or ssl_ciphers or ssl_ctx: if ssl_ctx is None: ssl_context = urllib3.connection.create_urllib3_context( options=ssl_options or None, ciphers=ssl_ciphers) if not requests.__version__ < "2.32": # https://github.com/psf/requests/pull/6731 ssl_context.load_verify_locations(requests.certs.where()) else: ssl_ctx_orig = urllib3.util.ssl_.SSLContext try: urllib3.util.ssl_.SSLContext = ssl_ctx ssl_context = urllib3.connection.create_urllib3_context( options=ssl_options or None, ciphers=ssl_ciphers) finally: urllib3.util.ssl_.SSLContext = ssl_ctx_orig ssl_context.check_hostname = False else: ssl_context = None adapter = CACHE_ADAPTERS[key] = RequestsAdapter( ssl_context, source_address) return adapter @cache.cache(maxage=86400) def _browser_useragent(): """Get User-Agent header from default browser""" import webbrowser import socket server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server.bind(("127.0.0.1", 0)) server.listen(1) host, port = server.getsockname() webbrowser.open(f"http://{host}:{port}/user-agent") client = server.accept()[0] server.close() for line in client.recv(1024).split(b"\r\n"): key, _, value = line.partition(b":") if key.strip().lower() == b"user-agent": useragent = value.strip() break else: useragent = b"" client.send(b"HTTP/1.1 200 OK\r\n\r\n" + useragent) client.close() return useragent.decode() CACHE_ADAPTERS = {} CACHE_COOKIES = {} CATEGORY_MAP = () HEADERS_FIREFOX_140 = ( ("User-Agent", "Mozilla/5.0 ({}; rv:140.0) Gecko/20100101 Firefox/140.0"), ("Accept", "text/html,application/xhtml+xml," "application/xml;q=0.9,*/*;q=0.8"), ("Accept-Language", "en-US,en;q=0.5"), ("Accept-Encoding", None), ("Connection", "keep-alive"), ("Content-Type", None), ("Content-Length", None), ("Referer", None), ("Origin", None), ("Cookie", None), ("Sec-Fetch-Dest", "empty"), ("Sec-Fetch-Mode", "cors"), ("Sec-Fetch-Site", "same-origin"), ("TE", "trailers"), ) HEADERS_FIREFOX_128 = ( ("User-Agent", "Mozilla/5.0 ({}; rv:128.0) Gecko/20100101 Firefox/128.0"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8"), ("Accept-Language", "en-US,en;q=0.5"), ("Accept-Encoding", None), ("Referer", None), ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("Cookie", None), ("Sec-Fetch-Dest", "empty"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Site", "same-origin"), ("TE", "trailers"), ) HEADERS_CHROMIUM_138 = ( ("Connection", "keep-alive"), ("sec-ch-ua", '"Not)A;Brand";v="8", "Chromium";v="138"'), ("sec-ch-ua-mobile", "?0"), ("sec-ch-ua-platform", '"Linux"'), ("Upgrade-Insecure-Requests", "1"), ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " "like Gecko) Chrome/138.0.0.0 Safari/537.36"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,*/*;q=0.8," "application/signed-exchange;v=b3;q=0.7"), ("Referer", None), ("Sec-Fetch-Site", "same-origin"), ("Sec-Fetch-Mode", "no-cors"), # ("Sec-Fetch-User", "?1"), ("Sec-Fetch-Dest", "empty"), ("Accept-Encoding", None), ("Accept-Language", "en-US,en;q=0.9"), ) HEADERS_CHROMIUM_111 = ( ("Connection", "keep-alive"), ("Upgrade-Insecure-Requests", "1"), ("User-Agent", "Mozilla/5.0 ({}) AppleWebKit/537.36 (KHTML, " "like Gecko) Chrome/111.0.0.0 Safari/537.36"), ("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/apng,*/*;q=0.8," "application/signed-exchange;v=b3;q=0.7"), ("Referer", None), ("Sec-Fetch-Site", "same-origin"), ("Sec-Fetch-Mode", "no-cors"), ("Sec-Fetch-Dest", "empty"), ("Accept-Encoding", None), ("Accept-Language", "en-US,en;q=0.9"), ("cookie", None), ("content-length", None), ) HEADERS = { "firefox" : HEADERS_FIREFOX_140, "firefox/140": HEADERS_FIREFOX_140, "firefox/128": HEADERS_FIREFOX_128, "chrome" : HEADERS_CHROMIUM_138, "chrome/138" : HEADERS_CHROMIUM_138, "chrome/111" : HEADERS_CHROMIUM_111, } CIPHERS_FIREFOX = ( "TLS_AES_128_GCM_SHA256:" "TLS_CHACHA20_POLY1305_SHA256:" "TLS_AES_256_GCM_SHA384:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-AES256-SHA:" "ECDHE-ECDSA-AES128-SHA:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ) CIPHERS_CHROMIUM = ( "TLS_AES_128_GCM_SHA256:" "TLS_AES_256_GCM_SHA384:" "TLS_CHACHA20_POLY1305_SHA256:" "ECDHE-ECDSA-AES128-GCM-SHA256:" "ECDHE-RSA-AES128-GCM-SHA256:" "ECDHE-ECDSA-AES256-GCM-SHA384:" "ECDHE-RSA-AES256-GCM-SHA384:" "ECDHE-ECDSA-CHACHA20-POLY1305:" "ECDHE-RSA-CHACHA20-POLY1305:" "ECDHE-RSA-AES128-SHA:" "ECDHE-RSA-AES256-SHA:" "AES128-GCM-SHA256:" "AES256-GCM-SHA384:" "AES128-SHA:" "AES256-SHA" ) CIPHERS = { "firefox" : CIPHERS_FIREFOX, "firefox/140": CIPHERS_FIREFOX, "firefox/128": CIPHERS_FIREFOX, "chrome" : CIPHERS_CHROMIUM, "chrome/138" : CIPHERS_CHROMIUM, "chrome/111" : CIPHERS_CHROMIUM, } # disable Basic Authorization header injection from .netrc data try: requests.sessions.get_netrc_auth = lambda _: None except Exception: pass # detect brotli support try: BROTLI = urllib3.response.brotli is not None except AttributeError: BROTLI = False # detect zstandard support try: ZSTD = urllib3.response.HAS_ZSTD except AttributeError: ZSTD = False # set (urllib3) warnings filter action = config.get((), "warnings", "default") if action: try: import warnings warnings.simplefilter(action, urllib3.exceptions.HTTPWarning) except Exception: pass del action �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/cyberdrop.py�������������������������������������������������0000644�0001750�0001750�00000006044�15040344700�021104� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://cyberdrop.me/""" from . import lolisafe from .common import Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?cyberdrop\.(?:me|to)" class CyberdropAlbumExtractor(lolisafe.LolisafeAlbumExtractor): """Extractor for cyberdrop albums""" category = "cyberdrop" root = "https://cyberdrop.me" root_api = "https://api.cyberdrop.me" pattern = BASE_PATTERN + r"/a/([^/?#]+)" example = "https://cyberdrop.me/a/ID" def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): file.update(data) text.nameext_from_url(file["name"], file) file["name"], sep, file["id"] = file["filename"].rpartition("-") yield Message.Url, file["url"], file def fetch_album(self, album_id): url = f"{self.root}/a/{album_id}" page = self.request(url).text extr = text.extract_from(page) desc = extr('property="og:description" content="', '"') if desc.startswith("A privacy-focused censorship-resistant file " "sharing platform free for everyone."): desc = "" extr('id="title"', "") album = { "album_id" : album_id, "album_name" : text.unescape(extr('title="', '"')), "album_size" : text.parse_bytes(extr( '<p class="title">', "B")), "date" : text.parse_datetime(extr( '<p class="title">', '<'), "%d.%m.%Y"), "description": text.unescape(text.unescape( # double desc.rpartition(" [R")[0])), } file_ids = list(text.extract_iter(page, 'id="file" href="/f/', '"')) album["count"] = len(file_ids) return self._extract_files(file_ids), album def _extract_files(self, file_ids): for file_id in file_ids: try: url = f"{self.root_api}/api/file/info/{file_id}" file = self.request_json(url) auth = self.request_json(file["auth_url"]) file["url"] = auth["url"] except Exception as exc: self.log.warning("%s (%s: %s)", file_id, exc.__class__.__name__, exc) continue yield file class CyberdropMediaExtractor(CyberdropAlbumExtractor): """Extractor for cyberdrop media links""" subcategory = "media" directory_fmt = ("{category}",) pattern = BASE_PATTERN + r"/f/([^/?#]+)" example = "https://cyberdrop.me/f/ID" def fetch_album(self, album_id): return self._extract_files((album_id,)), { "album_id" : "", "album_name" : "", "album_size" : -1, "description": "", "count" : 1, } ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753475101.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/danbooru.py��������������������������������������������������0000644�0001750�0001750�00000032220�15040764035�020726� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://danbooru.donmai.us/ and other Danbooru instances""" from .common import BaseExtractor, Message from .. import text, util import datetime class DanbooruExtractor(BaseExtractor): """Base class for danbooru extractors""" basecategory = "Danbooru" filename_fmt = "{category}_{id}_{filename}.{extension}" page_limit = 1000 page_start = None per_page = 200 useragent = util.USERAGENT request_interval = (0.5, 1.5) def _init(self): self.ugoira = self.config("ugoira", False) self.external = self.config("external", False) self.includes = False threshold = self.config("threshold") if isinstance(threshold, int): self.threshold = 1 if threshold < 1 else threshold else: self.threshold = self.per_page - 20 username, api_key = self._get_auth_info() if username: self.log.debug("Using HTTP Basic Auth for user '%s'", username) self.session.auth = util.HTTPBasicAuth(username, api_key) def skip(self, num): pages = num // self.per_page if pages >= self.page_limit: pages = self.page_limit - 1 self.page_start = pages + 1 return pages * self.per_page def items(self): # 'includes' initialization must be done here and not in '_init()' # or it'll cause an exception with e621 when 'metadata' is enabled if includes := self.config("metadata"): if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = "artist_commentary,children,notes,parent,uploader" self.includes = includes + ",id" data = self.metadata() for post in self.posts(): try: url = post["file_url"] except KeyError: if self.external and post["source"]: post.update(data) yield Message.Directory, post yield Message.Queue, post["source"], post continue text.nameext_from_url(url, post) post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") post["tags"] = ( post["tag_string"].split(" ") if post["tag_string"] else ()) post["tags_artist"] = ( post["tag_string_artist"].split(" ") if post["tag_string_artist"] else ()) post["tags_character"] = ( post["tag_string_character"].split(" ") if post["tag_string_character"] else ()) post["tags_copyright"] = ( post["tag_string_copyright"].split(" ") if post["tag_string_copyright"] else ()) post["tags_general"] = ( post["tag_string_general"].split(" ") if post["tag_string_general"] else ()) post["tags_meta"] = ( post["tag_string_meta"].split(" ") if post["tag_string_meta"] else ()) if post["extension"] == "zip": if self.ugoira: post["_ugoira_original"] = False post["_ugoira_frame_data"] = post["frames"] = \ self._ugoira_frames(post) post["_http_adjust_extension"] = False else: url = post["large_file_url"] post["extension"] = "webm" if url[0] == "/": url = self.root + url post.update(data) yield Message.Directory, post yield Message.Url, url, post def items_artists(self): for artist in self.artists(): artist["_extractor"] = DanbooruTagExtractor url = f"{self.root}/posts?tags={text.quote(artist['name'])}" yield Message.Queue, url, artist def metadata(self): return () def posts(self): return () def _pagination(self, endpoint, params, prefix=None): url = self.root + endpoint params["limit"] = self.per_page params["page"] = self.page_start first = True while True: posts = self.request_json(url, params=params) if isinstance(posts, dict): posts = posts["posts"] if posts: if self.includes: params_meta = { "only" : self.includes, "limit": len(posts), "tags" : "id:" + ",".join(str(p["id"]) for p in posts), } data = { meta["id"]: meta for meta in self.request_json(url, params=params_meta) } for post in posts: post.update(data[post["id"]]) if prefix == "a" and not first: posts.reverse() yield from posts if len(posts) < self.threshold: return if prefix: params["page"] = f"{prefix}{posts[-1]['id']}" elif params["page"]: params["page"] += 1 else: params["page"] = 2 first = False def _ugoira_frames(self, post): data = self.request_json( f"{self.root}/posts/{post['id']}.json?only=media_metadata" )["media_metadata"]["metadata"] if "Ugoira:FrameMimeType" in data: ext = data["Ugoira:FrameMimeType"].rpartition("/")[2] if ext == "jpeg": ext = "jpg" else: ext = data["ZIP:ZipFileName"].rpartition(".")[2] fmt = ("{:>06}." + ext).format delays = data["Ugoira:FrameDelays"] return [{"file": fmt(index), "delay": delay} for index, delay in enumerate(delays)] def _collection_posts(self, cid, ctype): reverse = prefix = None order = self.config("order-posts") if not order or order in {"asc", "pool", "pool_asc", "asc_pool"}: params = {"tags": f"ord{ctype}:{cid}"} elif order in {"id", "desc_id", "id_desc"}: params = {"tags": f"{ctype}:{cid}"} prefix = "b" elif order in {"desc", "desc_pool", "pool_desc"}: params = {"tags": f"ord{ctype}:{cid}"} reverse = True elif order in {"asc_id", "id_asc"}: params = {"tags": f"{ctype}:{cid}"} reverse = True posts = self._pagination("/posts.json", params, prefix) if reverse: self.log.info("Collecting posts of %s %s", ctype, cid) return self._collection_enumerate_reverse(posts) else: return self._collection_enumerate(posts) def _collection_metadata(self, cid, ctype, cname=None): url = f"{self.root}/{cname or ctype}s/{cid}.json" collection = self.request_json(url) collection["name"] = collection["name"].replace("_", " ") self.post_ids = collection.pop("post_ids", ()) return {ctype: collection} def _collection_enumerate(self, posts): pid_to_num = {pid: num for num, pid in enumerate(self.post_ids, 1)} for post in posts: post["num"] = pid_to_num[post["id"]] yield post def _collection_enumerate_reverse(self, posts): posts = list(posts) posts.reverse() pid_to_num = {pid: num for num, pid in enumerate(self.post_ids, 1)} for post in posts: post["num"] = pid_to_num[post["id"]] return posts BASE_PATTERN = DanbooruExtractor.update({ "danbooru": { "root": None, "pattern": r"(?:(?:danbooru|hijiribe|sonohara|safebooru)\.donmai\.us" r"|donmai\.moe)", }, "atfbooru": { "root": "https://booru.allthefallen.moe", "pattern": r"booru\.allthefallen\.moe", }, "aibooru": { "root": None, "pattern": r"(?:safe\.)?aibooru\.online", }, "booruvar": { "root": "https://booru.borvar.art", "pattern": r"booru\.borvar\.art", }, }) class DanbooruTagExtractor(DanbooruExtractor): """Extractor for danbooru posts from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/posts\?(?:[^&#]*&)*tags=([^&#]*)" example = "https://danbooru.donmai.us/posts?tags=TAG" def metadata(self): self.tags = text.unquote(self.groups[-1].replace("+", " ")) return {"search_tags": self.tags} def posts(self): prefix = "b" for tag in self.tags.split(): if tag.startswith("order:"): if tag == "order:id" or tag == "order:id_asc": prefix = "a" elif tag == "order:id_desc": prefix = "b" else: prefix = None elif tag.startswith( ("id:", "md5:", "ordfav:", "ordfavgroup:", "ordpool:")): prefix = None break return self._pagination("/posts.json", {"tags": self.tags}, prefix) class DanbooruPoolExtractor(DanbooruExtractor): """Extractor for Danbooru pools""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool[id]} {pool[name]}") filename_fmt = "{num:>04}_{id}_{filename}.{extension}" archive_fmt = "p_{pool[id]}_{id}" pattern = BASE_PATTERN + r"/pool(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/pools/12345" def metadata(self): self.pool_id = self.groups[-1] return self._collection_metadata(self.pool_id, "pool") def posts(self): return self._collection_posts(self.pool_id, "pool") class DanbooruFavgroupExtractor(DanbooruExtractor): """Extractor for Danbooru favorite groups""" subcategory = "favgroup" directory_fmt = ("{category}", "Favorite Groups", "{favgroup[id]} {favgroup[name]}") filename_fmt = "{num:>04}_{id}_{filename}.{extension}" archive_fmt = "fg_{favgroup[id]}_{id}" pattern = BASE_PATTERN + r"/favorite_group(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/favorite_groups/12345" def metadata(self): return self._collection_metadata( self.groups[-1], "favgroup", "favorite_group") def posts(self): return self._collection_posts(self.groups[-1], "favgroup") class DanbooruPostExtractor(DanbooruExtractor): """Extractor for single danbooru posts""" subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post(?:s|/show)/(\d+)" example = "https://danbooru.donmai.us/posts/12345" def posts(self): url = f"{self.root}/posts/{self.groups[-1]}.json" post = self.request_json(url) if self.includes: params = {"only": self.includes} post.update(self.request_json(url, params=params)) return (post,) class DanbooruPopularExtractor(DanbooruExtractor): """Extractor for popular images from danbooru""" subcategory = "popular" directory_fmt = ("{category}", "popular", "{scale}", "{date}") archive_fmt = "P_{scale[0]}_{date}_{id}" pattern = BASE_PATTERN + r"/(?:explore/posts/)?popular(?:\?([^#]*))?" example = "https://danbooru.donmai.us/explore/posts/popular" def metadata(self): self.params = params = text.parse_query(self.groups[-1]) scale = params.get("scale", "day") date = params.get("date") or datetime.date.today().isoformat() if scale == "week": date = datetime.date.fromisoformat(date) date = (date - datetime.timedelta(days=date.weekday())).isoformat() elif scale == "month": date = date[:-3] return {"date": date, "scale": scale} def posts(self): return self._pagination("/explore/posts/popular.json", self.params) class DanbooruArtistExtractor(DanbooruExtractor): """Extractor for danbooru artists""" subcategory = "artist" pattern = BASE_PATTERN + r"/artists/(\d+)" example = "https://danbooru.donmai.us/artists/12345" items = DanbooruExtractor.items_artists def artists(self): url = f"{self.root}/artists/{self.groups[-1]}.json" return (self.request_json(url),) class DanbooruArtistSearchExtractor(DanbooruExtractor): """Extractor for danbooru artist searches""" subcategory = "artist-search" pattern = BASE_PATTERN + r"/artists/?\?([^#]+)" example = "https://danbooru.donmai.us/artists?QUERY" items = DanbooruExtractor.items_artists def artists(self): url = self.root + "/artists.json" params = text.parse_query(self.groups[-1]) params["page"] = text.parse_int(params.get("page"), 1) while True: artists = self.request_json(url, params=params) yield from artists if len(artists) < 20: return params["page"] += 1 ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/dankefuerslesen.py�������������������������������������������0000644�0001750�0001750�00000007431�15040344700�022272� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://danke.moe/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util from ..cache import memcache BASE_PATTERN = r"(?:https?://)?(?:www\.)?danke\.moe" class DankefuerslesenBase(): """Base class for dankefuerslesen extractors""" category = "dankefuerslesen" root = "https://danke.moe" @memcache(keyarg=1) def _manga_info(self, slug): url = f"{self.root}/api/series/{slug}/" return self.request_json(url) class DankefuerslesenChapterExtractor(DankefuerslesenBase, ChapterExtractor): """Extractor for Danke fürs Lesen manga chapters""" pattern = BASE_PATTERN + r"/read/manga/([\w-]+)/([\w-]+)" example = "https://danke.moe/read/manga/TITLE/123/1/" def _init(self): self.zip = self.config("zip", False) if self.zip: self.filename_fmt = f"{self.directory_fmt[-1]}.{{extension}}" self.directory_fmt = self.directory_fmt[:-1] def metadata(self, page): slug, ch = self.groups manga = self._manga_info(slug) if "-" in ch: chapter, sep, minor = ch.rpartition("-") ch = ch.replace("-", ".") minor = "." + minor else: chapter = ch minor = "" data = manga["chapters"][ch] group_id, self._files = next(iter(data["groups"].items())) if not self.zip: self.base = (f"{self.root}/media/manga/{slug}/chapters" f"/{data['folder']}/{group_id}/") return { "manga" : manga["title"], "manga_slug": manga["slug"], "title" : data["title"], "volume" : text.parse_int(data["volume"]), "chapter" : text.parse_int(chapter), "chapter_minor": minor, "group" : manga["groups"][group_id].split(" & "), "group_id" : text.parse_int(group_id), "date" : text.parse_timestamp(data["release_date"][group_id]), "lang" : util.NONE, "language" : util.NONE, } def images(self, page): if self.zip: return () base = self.base return [(base + file, None) for file in self._files] def assets(self, page): if self.zip: slug, ch = self.groups url = f"{self.root}/api/download_chapter/{slug}/{ch}/" return ({ "type" : "archive", "extension": "zip", "url" : url, },) class DankefuerslesenMangaExtractor(DankefuerslesenBase, MangaExtractor): """Extractor for Danke fürs Lesen manga""" chapterclass = DankefuerslesenChapterExtractor reverse = False pattern = BASE_PATTERN + r"/read/manga/([^/?#]+)" example = "https://danke.moe/read/manga/TITLE/" def chapters(self, page): results = [] manga = self._manga_info(self.groups[0]).copy() manga["lang"] = util.NONE manga["language"] = util.NONE base = f"{self.root}/read/manga/{manga['slug']}/" for ch, data in manga.pop("chapters").items(): if "." in ch: chapter, sep, minor = ch.rpartition(".") ch = ch.replace('.', '-') data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor else: data["chapter"] = text.parse_int(ch) data["chapter_minor"] = "" manga.update(data) results.append((f"{base}{ch}/1/", manga)) return results ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/desktopography.py��������������������������������������������0000644�0001750�0001750�00000006051�15040344700�022154� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://desktopography.net/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?desktopography\.net" class DesktopographyExtractor(Extractor): """Base class for desktopography extractors""" category = "desktopography" archive_fmt = "{filename}" root = "https://desktopography.net" class DesktopographySiteExtractor(DesktopographyExtractor): """Extractor for all desktopography exhibitions """ subcategory = "site" pattern = BASE_PATTERN + r"/$" example = "https://desktopography.net/" def items(self): page = self.request(self.root).text data = {"_extractor": DesktopographyExhibitionExtractor} for exhibition_year in text.extract_iter( page, '<a href="https://desktopography.net/exhibition-', '/">'): url = self.root + "/exhibition-" + exhibition_year + "/" yield Message.Queue, url, data class DesktopographyExhibitionExtractor(DesktopographyExtractor): """Extractor for a yearly desktopography exhibition""" subcategory = "exhibition" pattern = BASE_PATTERN + r"/exhibition-([^/?#]+)/" example = "https://desktopography.net/exhibition-2020/" def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.year = match[1] def items(self): url = f"{self.root}/exhibition-{self.year}/" base_entry_url = "https://desktopography.net/portfolios/" page = self.request(url).text data = { "_extractor": DesktopographyEntryExtractor, "year": self.year, } for entry_url in text.extract_iter( page, '<a class="overlay-background" href="' + base_entry_url, '">'): url = base_entry_url + entry_url yield Message.Queue, url, data class DesktopographyEntryExtractor(DesktopographyExtractor): """Extractor for all resolutions of a desktopography wallpaper""" subcategory = "entry" pattern = BASE_PATTERN + r"/portfolios/([\w-]+)" example = "https://desktopography.net/portfolios/NAME/" def __init__(self, match): DesktopographyExtractor.__init__(self, match) self.entry = match[1] def items(self): url = f"{self.root}/portfolios/{self.entry}" page = self.request(url).text entry_data = {"entry": self.entry} yield Message.Directory, entry_data for image_data in text.extract_iter( page, '<a target="_blank" href="https://desktopography.net', '">'): path, _, filename = image_data.partition( '" class="wallpaper-button" download="') text.nameext_from_url(filename, entry_data) yield Message.Url, self.root + path, entry_data ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/deviantart.py������������������������������������������������0000644�0001750�0001750�00000244247�15040344700�021265� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.deviantart.com/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import cache, memcache import collections import mimetypes import binascii import time BASE_PATTERN = ( r"(?:https?://)?(?:" r"(?:www\.)?(?:fx)?deviantart\.com/(?!watch/)([\w-]+)|" r"(?!www\.)([\w-]+)\.(?:fx)?deviantart\.com)" ) DEFAULT_AVATAR = "https://a.deviantart.net/avatars/default.gif" class DeviantartExtractor(Extractor): """Base class for deviantart extractors""" category = "deviantart" root = "https://www.deviantart.com" directory_fmt = ("{category}", "{username}") filename_fmt = "{category}_{index}_{title}.{extension}" cookies_domain = ".deviantart.com" cookies_names = ("auth", "auth_secure", "userinfo") _last_request = 0 def __init__(self, match): Extractor.__init__(self, match) self.user = (match[1] or match[2] or "").lower() self.offset = 0 def _init(self): self.jwt = self.config("jwt", False) self.flat = self.config("flat", True) self.extra = self.config("extra", False) self.quality = self.config("quality", "100") self.original = self.config("original", True) self.previews = self.config("previews", False) self.intermediary = self.config("intermediary", True) self.comments_avatars = self.config("comments-avatars", False) self.comments = self.comments_avatars or self.config("comments", False) self.api = DeviantartOAuthAPI(self) self.eclipse_api = None self.group = False self._premium_cache = {} if self.config("auto-unwatch"): self.unwatch = [] self.finalize = self._unwatch_premium else: self.unwatch = None if self.quality: if self.quality == "png": self.quality = "-fullview.png?" self.quality_sub = util.re(r"-fullview\.[a-z0-9]+\?").sub else: self.quality = f",q_{self.quality}" self.quality_sub = util.re(r",q_\d+").sub if self.intermediary: self.intermediary_subn = util.re(r"(/f/[^/]+/[^/]+)/v\d+/.*").subn if isinstance(self.original, str) and \ self.original.lower().startswith("image"): self.original = True self._update_content = self._update_content_image else: self._update_content = self._update_content_default if self.previews == "all": self.previews_images = self.previews = True else: self.previews_images = False journals = self.config("journals", "html") if journals == "html": self.commit_journal = self._commit_journal_html elif journals == "text": self.commit_journal = self._commit_journal_text else: self.commit_journal = None def request(self, url, **kwargs): if "fatal" not in kwargs: kwargs["fatal"] = False while True: response = Extractor.request(self, url, **kwargs) if response.status_code != 403 or \ b"Request blocked." not in response.content: return response self.wait(seconds=300, reason="CloudFront block") def skip(self, num): self.offset += num return num def login(self): if self.cookies_check(self.cookies_names): return True username, password = self._get_auth_info() if username: self.cookies_update(_login_impl(self, username, password)) return True def items(self): if self.user: if group := self.config("group", True): if user := _user_details(self, self.user): self.user = user["username"] self.group = False elif group == "skip": self.log.info("Skipping group '%s'", self.user) raise exception.AbortExtraction() else: self.subcategory = "group-" + self.subcategory self.group = True for deviation in self.deviations(): if isinstance(deviation, tuple): url, data = deviation yield Message.Queue, url, data continue if deviation["is_deleted"]: # prevent crashing in case the deviation really is # deleted self.log.debug( "Skipping %s (deleted)", deviation["deviationid"]) continue tier_access = deviation.get("tier_access") if tier_access == "locked": self.log.debug( "Skipping %s (access locked)", deviation["deviationid"]) continue if "premium_folder_data" in deviation: data = self._fetch_premium(deviation) if not data: continue deviation.update(data) self.prepare(deviation) yield Message.Directory, deviation if "content" in deviation: content = self._extract_content(deviation) yield self.commit(deviation, content) elif deviation["is_downloadable"]: content = self.api.deviation_download(deviation["deviationid"]) deviation["is_original"] = True yield self.commit(deviation, content) if "videos" in deviation and deviation["videos"]: video = max(deviation["videos"], key=lambda x: text.parse_int(x["quality"][:-1])) deviation["is_original"] = False yield self.commit(deviation, video) if "flash" in deviation: deviation["is_original"] = True yield self.commit(deviation, deviation["flash"]) if self.commit_journal: if journal := self._extract_journal(deviation): if self.extra: deviation["_journal"] = journal["html"] deviation["is_original"] = True yield self.commit_journal(deviation, journal) if self.comments_avatars: for comment in deviation["comments"]: user = comment["user"] name = user["username"].lower() if user["usericon"] == DEFAULT_AVATAR: self.log.debug( "Skipping avatar of '%s' (default)", name) continue _user_details.update(name, user) url = f"{self.root}/{name}/avatar/" comment["_extractor"] = DeviantartAvatarExtractor yield Message.Queue, url, comment if self.previews and "preview" in deviation: preview = deviation["preview"] deviation["is_preview"] = True if self.previews_images: yield self.commit(deviation, preview) else: mtype = mimetypes.guess_type( "a." + deviation["extension"], False)[0] if mtype and not mtype.startswith("image/"): yield self.commit(deviation, preview) del deviation["is_preview"] if not self.extra: continue # ref: https://www.deviantart.com # /developers/http/v1/20210526/object/editor_text # the value of "features" is a JSON string with forward # slashes escaped text_content = \ deviation["text_content"]["body"]["features"].replace( "\\/", "/") if "text_content" in deviation else None for txt in (text_content, deviation.get("description"), deviation.get("_journal")): if txt is None: continue for match in DeviantartStashExtractor.pattern.finditer(txt): url = text.ensure_http_scheme(match[0]) deviation["_extractor"] = DeviantartStashExtractor yield Message.Queue, url, deviation def deviations(self): """Return an iterable containing all relevant Deviation-objects""" def prepare(self, deviation): """Adjust the contents of a Deviation-object""" if "index" not in deviation: try: if deviation["url"].startswith(( "https://www.deviantart.com/stash/", "https://sta.sh", )): filename = deviation["content"]["src"].split("/")[5] deviation["index_base36"] = filename.partition("-")[0][1:] deviation["index"] = id_from_base36( deviation["index_base36"]) else: deviation["index"] = text.parse_int( deviation["url"].rpartition("-")[2]) except KeyError: deviation["index"] = 0 deviation["index_base36"] = "0" if "index_base36" not in deviation: deviation["index_base36"] = base36_from_id(deviation["index"]) if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["published_time"] = text.parse_int( deviation["published_time"]) deviation["date"] = text.parse_timestamp( deviation["published_time"]) if self.comments: deviation["comments"] = ( self._extract_comments(deviation["deviationid"], "deviation") if deviation["stats"]["comments"] else () ) # filename metadata sub = util.re(r"\W").sub deviation["filename"] = "".join(( sub("_", deviation["title"].lower()), "_by_", sub("_", deviation["author"]["username"].lower()), "-d", deviation["index_base36"], )) def commit(self, deviation, target): url = target["src"] name = target.get("filename") or url target = target.copy() target["filename"] = deviation["filename"] deviation["target"] = target deviation["extension"] = target["extension"] = text.ext_from_url(name) if "is_original" not in deviation: deviation["is_original"] = ("/v1/" not in url) return Message.Url, url, deviation def _commit_journal_html(self, deviation, journal): title = text.escape(deviation["title"]) url = deviation["url"] thumbs = deviation.get("thumbs") or deviation.get("files") html = journal["html"] shadow = SHADOW_TEMPLATE.format_map(thumbs[0]) if thumbs else "" if not html: self.log.warning("%s: Empty journal content", deviation["index"]) if "css" in journal: css, cls = journal["css"], "withskin" elif html.startswith("<style"): css, _, html = html.partition("</style>") css = css.partition(">")[2] cls = "withskin" else: css, cls = "", "journal-green" if html.find('<div class="boxtop journaltop">', 0, 250) != -1: needle = '<div class="boxtop journaltop">' header = HEADER_CUSTOM_TEMPLATE.format( title=title, url=url, date=deviation["date"], ) else: needle = '<div usr class="gr">' username = deviation["author"]["username"] urlname = deviation.get("username") or username.lower() header = HEADER_TEMPLATE.format( title=title, url=url, userurl=f"{self.root}/{urlname}/", username=username, date=deviation["date"], ) if needle in html: html = html.replace(needle, header, 1) else: html = JOURNAL_TEMPLATE_HTML_EXTRA.format(header, html) html = JOURNAL_TEMPLATE_HTML.format( title=title, html=html, shadow=shadow, css=css, cls=cls) deviation["extension"] = "htm" return Message.Url, html, deviation def _commit_journal_text(self, deviation, journal): html = journal["html"] if not html: self.log.warning("%s: Empty journal content", deviation["index"]) elif html.startswith("<style"): html = html.partition("</style>")[2] head, _, tail = html.rpartition("<script") content = "\n".join( text.unescape(text.remove_html(txt)) for txt in (head or tail).split("<br />") ) txt = JOURNAL_TEMPLATE_TEXT.format( title=deviation["title"], username=deviation["author"]["username"], date=deviation["date"], content=content, ) deviation["extension"] = "txt" return Message.Url, txt, deviation def _extract_journal(self, deviation): if "excerpt" in deviation: # # empty 'html' # return self.api.deviation_content(deviation["deviationid"]) if "_page" in deviation: page = deviation["_page"] del deviation["_page"] else: page = self._limited_request(deviation["url"]).text # extract journal html from webpage html = text.extr( page, "<h2>Literature Text</h2></span><div>", "</div></section></div></div>") if html: return {"html": html} self.log.debug("%s: Failed to extract journal HTML from webpage. " "Falling back to __INITIAL_STATE__ markup.", deviation["index"]) # parse __INITIAL_STATE__ as fallback state = util.json_loads(text.extr( page, 'window.__INITIAL_STATE__ = JSON.parse("', '");') .replace("\\\\", "\\").replace("\\'", "'").replace('\\"', '"')) deviations = state["@@entities"]["deviation"] content = deviations.popitem()[1]["textContent"] if html := self._textcontent_to_html(deviation, content): return {"html": html} return {"html": content["excerpt"].replace("\n", "<br />")} if "body" in deviation: return {"html": deviation.pop("body")} return None def _textcontent_to_html(self, deviation, content): html = content["html"] markup = html.get("markup") if not markup or markup[0] != "{": return markup if html["type"] == "tiptap": try: return self._tiptap_to_html(markup) except Exception as exc: self.log.debug("", exc_info=exc) self.log.error("%s: '%s: %s'", deviation["index"], exc.__class__.__name__, exc) self.log.warning("%s: Unsupported '%s' markup.", deviation["index"], html["type"]) def _tiptap_to_html(self, markup): html = [] html.append('<div data-editor-viewer="1" ' 'class="_83r8m _2CKTq _3NjDa mDnFl">') data = util.json_loads(markup) for block in data["document"]["content"]: self._tiptap_process_content(html, block) html.append("</div>") return "".join(html) def _tiptap_process_content(self, html, content): type = content["type"] if type == "paragraph": if children := content.get("content"): html.append('<p style="') attrs = content["attrs"] if attrs.get("textAlign"): html.append("text-align:") html.append(attrs["textAlign"]) html.append(";") self._tiptap_process_indentation(html, attrs) html.append('">') for block in children: self._tiptap_process_content(html, block) html.append("</p>") else: html.append('<p class="empty-p"><br/></p>') elif type == "text": self._tiptap_process_text(html, content) elif type == "heading": attrs = content["attrs"] level = str(attrs.get("level") or "3") html.append("<h") html.append(level) html.append(' style="text-align:') html.append(attrs.get("textAlign") or "left") html.append('">') html.append('<span style="') self._tiptap_process_indentation(html, attrs) html.append('">') self._tiptap_process_children(html, content) html.append("</span></h") html.append(level) html.append(">") elif type in ("listItem", "bulletList", "orderedList", "blockquote"): c = type[1] tag = ( "li" if c == "i" else "ul" if c == "u" else "ol" if c == "r" else "blockquote" ) html.append("<" + tag + ">") self._tiptap_process_children(html, content) html.append("</" + tag + ">") elif type == "anchor": attrs = content["attrs"] html.append('<a id="') html.append(attrs.get("id") or "") html.append('" data-testid="anchor"></a>') elif type == "hardBreak": html.append("<br/><br/>") elif type == "horizontalRule": html.append("<hr/>") elif type == "da-deviation": self._tiptap_process_deviation(html, content) elif type == "da-mention": user = content["attrs"]["user"]["username"] html.append('<a href="https://www.deviantart.com/') html.append(user.lower()) html.append('" data-da-type="da-mention" data-user="">@<!-- -->') html.append(user) html.append('</a>') elif type == "da-gif": attrs = content["attrs"] width = str(attrs.get("width") or "") height = str(attrs.get("height") or "") url = text.escape(attrs.get("url") or "") html.append('<div data-da-type="da-gif" data-width="') html.append(width) html.append('" data-height="') html.append(height) html.append('" data-alignment="') html.append(attrs.get("alignment") or "") html.append('" data-url="') html.append(url) html.append('" class="t61qu"><video role="img" autoPlay="" ' 'muted="" loop="" style="pointer-events:none" ' 'controlsList="nofullscreen" playsInline="" ' 'aria-label="gif" data-da-type="da-gif" width="') html.append(width) html.append('" height="') html.append(height) html.append('" src="') html.append(url) html.append('" class="_1Fkk6"></video></div>') elif type == "da-video": src = text.escape(content["attrs"].get("src") or "") html.append('<div data-testid="video" data-da-type="da-video" ' 'data-src="') html.append(src) html.append('" class="_1Uxvs"><div data-canfs="yes" data-testid="v' 'ideo-inner" class="main-video" style="width:780px;hei' 'ght:438px"><div style="width:780px;height:438px">' '<video src="') html.append(src) html.append('" style="width:100%;height:100%;" preload="auto" cont' 'rols=""></video></div></div></div>') else: self.log.warning("Unsupported content type '%s'", type) def _tiptap_process_text(self, html, content): if marks := content.get("marks"): close = [] for mark in marks: type = mark["type"] if type == "link": attrs = mark.get("attrs") or {} html.append('<a href="') html.append(text.escape(attrs.get("href") or "")) if "target" in attrs: html.append('" target="') html.append(attrs["target"]) html.append('" rel="') html.append(attrs.get("rel") or "noopener noreferrer nofollow ugc") html.append('">') close.append("</a>") elif type == "bold": html.append("<strong>") close.append("</strong>") elif type == "italic": html.append("<em>") close.append("</em>") elif type == "underline": html.append("<u>") close.append("</u>") elif type == "strike": html.append("<s>") close.append("</s>") elif type == "textStyle" and len(mark) <= 1: pass else: self.log.warning("Unsupported text marker '%s'", type) close.reverse() html.append(text.escape(content["text"])) html.extend(close) else: html.append(text.escape(content["text"])) def _tiptap_process_children(self, html, content): if children := content.get("content"): for block in children: self._tiptap_process_content(html, block) def _tiptap_process_indentation(self, html, attrs): itype = ("text-indent" if attrs.get("indentType") == "line" else "margin-inline-start") isize = str((attrs.get("indentation") or 0) * 24) html.append(itype + ":" + isize + "px") def _tiptap_process_deviation(self, html, content): dev = content["attrs"]["deviation"] media = dev.get("media") or () html.append('<div class="jjNX2">') html.append('<figure class="Qf-HY" data-da-type="da-deviation" ' 'data-deviation="" ' 'data-width="" data-link="" data-alignment="center">') if "baseUri" in media: url, formats = self._eclipse_media(media) full = formats["fullview"] html.append('<a href="') html.append(text.escape(dev["url"])) html.append('" class="_3ouD5" style="margin:0 auto;display:flex;' 'align-items:center;justify-content:center;' 'overflow:hidden;width:780px;height:') html.append(str(780 * full["h"] / full["w"])) html.append('px">') html.append('<img src="') html.append(text.escape(url)) html.append('" alt="') html.append(text.escape(dev["title"])) html.append('" style="width:100%;max-width:100%;display:block"/>') html.append("</a>") elif "textContent" in dev: html.append('<div class="_32Hs4" style="width:350px">') html.append('<a href="') html.append(text.escape(dev["url"])) html.append('" class="_3ouD5">') html.append('''\ <section class="Q91qI aG7Yi" style="width:350px;height:313px">\ <div class="_16ECM _1xMkk" aria-hidden="true">\ <svg height="100%" viewBox="0 0 15 12" preserveAspectRatio="xMidYMin slice" \ fill-rule="evenodd">\ <linearGradient x1="87.8481761%" y1="16.3690766%" \ x2="45.4107524%" y2="71.4898596%" id="app-root-3">\ <stop stop-color="#00FF62" offset="0%"></stop>\ <stop stop-color="#3197EF" stop-opacity="0" offset="100%"></stop>\ </linearGradient>\ <text class="_2uqbc" fill="url(#app-root-3)" text-anchor="end" x="15" y="11">J\ </text></svg></div><div class="_1xz9u">Literature</div><h3 class="_2WvKD">\ ''') html.append(text.escape(dev["title"])) html.append('</h3><div class="_2CPLm">') html.append(text.escape(dev["textContent"]["excerpt"])) html.append('</div></section></a></div>') html.append('</figure></div>') def _extract_content(self, deviation): content = deviation["content"] if self.original and deviation["is_downloadable"]: self._update_content(deviation, content) return content if self.jwt: self._update_token(deviation, content) return content if content["src"].startswith("https://images-wixmp-"): if self.intermediary and deviation["index"] <= 790677560: # https://github.com/r888888888/danbooru/issues/4069 intermediary, count = self.intermediary_subn( r"/intermediary\1", content["src"], 1) if count: deviation["is_original"] = False deviation["_fallback"] = (content["src"],) content["src"] = intermediary if self.quality: content["src"] = self.quality_sub( self.quality, content["src"], 1) return content def _find_folder(self, folders, name, uuid): if uuid.isdecimal(): match = util.re( "(?i)" + name.replace("-", "[^a-z0-9]+") + "$").match for folder in folders: if match(folder["name"]): return folder elif folder.get("has_subfolders"): for subfolder in folder["subfolders"]: if match(subfolder["name"]): return subfolder else: for folder in folders: if folder["folderid"] == uuid: return folder elif folder.get("has_subfolders"): for subfolder in folder["subfolders"]: if subfolder["folderid"] == uuid: return subfolder raise exception.NotFoundError("folder") def _folder_urls(self, folders, category, extractor): base = f"{self.root}/{self.user}/{category}/" for folder in folders: folder["_extractor"] = extractor url = f"{base}{folder['folderid']}/{folder['name']}" yield url, folder def _update_content_default(self, deviation, content): if "premium_folder_data" in deviation or deviation.get("is_mature"): public = False else: public = None data = self.api.deviation_download(deviation["deviationid"], public) content.update(data) deviation["is_original"] = True def _update_content_image(self, deviation, content): data = self.api.deviation_download(deviation["deviationid"]) url = data["src"].partition("?")[0] mtype = mimetypes.guess_type(url, False)[0] if mtype and mtype.startswith("image/"): content.update(data) deviation["is_original"] = True def _update_token(self, deviation, content): """Replace JWT to be able to remove width/height limits All credit goes to @Ironchest337 for discovering and implementing this method """ url, sep, _ = content["src"].partition("/v1/") if not sep: return # 'images-wixmp' returns 401 errors, but just 'wixmp' still works url = url.replace("//images-wixmp", "//wixmp", 1) # header = b'{"typ":"JWT","alg":"none"}' payload = ( b'{"sub":"urn:app:","iss":"urn:app:","obj":[[{"path":"/f/' + url.partition("/f/")[2].encode() + b'"}]],"aud":["urn:service:file.download"]}' ) deviation["_fallback"] = (content["src"],) deviation["is_original"] = True pl = binascii.b2a_base64(payload).rstrip(b'=\n').decode() content["src"] = ( # base64 of 'header' is precomputed as 'eyJ0eX...' f"{url}?token=eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.{pl}.") def _extract_comments(self, target_id, target_type="deviation"): results = None comment_ids = [None] while comment_ids: comments = self.api.comments( target_id, target_type, comment_ids.pop()) if results: results.extend(comments) else: results = comments # parent comments, i.e. nodes with at least one child parents = {c["parentid"] for c in comments} # comments with more than one reply replies = {c["commentid"] for c in comments if c["replies"]} # add comment UUIDs with replies that are not parent to any node comment_ids.extend(replies - parents) return results def _limited_request(self, url, **kwargs): """Limits HTTP requests to one every 2 seconds""" diff = time.time() - DeviantartExtractor._last_request if diff < 2.0: self.sleep(2.0 - diff, "request") response = self.request(url, **kwargs) DeviantartExtractor._last_request = time.time() return response def _fetch_premium(self, deviation): try: return self._premium_cache[deviation["deviationid"]] except KeyError: pass if not self.api.refresh_token_key: self.log.warning( "Unable to access premium content (no refresh-token)") self._fetch_premium = lambda _: None return None dev = self.api.deviation(deviation["deviationid"], False) folder = deviation["premium_folder_data"] username = dev["author"]["username"] # premium_folder_data is no longer present when user has access (#5063) has_access = ("premium_folder_data" not in dev) or folder["has_access"] if not has_access and folder["type"] == "watchers" and \ self.config("auto-watch"): if self.unwatch is not None: self.unwatch.append(username) if self.api.user_friends_watch(username): has_access = True self.log.info( "Watching %s for premium folder access", username) else: self.log.warning( "Error when trying to watch %s. " "Try again with a new refresh-token", username) if has_access: self.log.info("Fetching premium folder data") else: self.log.warning("Unable to access premium content (type: %s)", folder["type"]) cache = self._premium_cache for dev in self.api.gallery( username, folder["gallery_id"], public=False): cache[dev["deviationid"]] = dev if has_access else None return cache.get(deviation["deviationid"]) def _unwatch_premium(self): for username in self.unwatch: self.log.info("Unwatching %s", username) self.api.user_friends_unwatch(username) def _eclipse_media(self, media, format="preview"): url = [media["baseUri"]] formats = { fmt["t"]: fmt for fmt in media["types"] } if tokens := media.get("token") or (): if len(tokens) <= 1: fmt = formats[format] if "c" in fmt: url.append(fmt["c"].replace( "<prettyName>", media["prettyName"])) url.append("?token=") url.append(tokens[-1]) return "".join(url), formats def _eclipse_to_oauth(self, eclipse_api, deviations): for obj in deviations: deviation = obj["deviation"] if "deviation" in obj else obj deviation_uuid = eclipse_api.deviation_extended_fetch( deviation["deviationId"], deviation["author"]["username"], "journal" if deviation["isJournal"] else "art", )["deviation"]["extended"]["deviationUuid"] yield self.api.deviation(deviation_uuid) def _unescape_json(self, json): return json.replace('\\"', '"') \ .replace("\\'", "'") \ .replace("\\\\", "\\") class DeviantartUserExtractor(Dispatch, DeviantartExtractor): """Extractor for an artist's user profile""" pattern = BASE_PATTERN + r"/?$" example = "https://www.deviantart.com/USER" def items(self): base = f"{self.root}/{self.user}/" return self._dispatch_extractors(( (DeviantartAvatarExtractor , base + "avatar"), (DeviantartBackgroundExtractor, base + "banner"), (DeviantartGalleryExtractor , base + "gallery"), (DeviantartScrapsExtractor , base + "gallery/scraps"), (DeviantartJournalExtractor , base + "posts"), (DeviantartStatusExtractor , base + "posts/statuses"), (DeviantartFavoriteExtractor , base + "favourites"), ), ("gallery",)) ############################################################################### # OAuth ####################################################################### class DeviantartGalleryExtractor(DeviantartExtractor): """Extractor for all deviations from an artist's gallery""" subcategory = "gallery" archive_fmt = "g_{_username}_{index}.{extension}" pattern = (BASE_PATTERN + r"/gallery" r"(?:/all|/recommended-for-you|/?\?catpath=)?/?$") example = "https://www.deviantart.com/USER/gallery/" def deviations(self): if self.flat and not self.group: return self.api.gallery_all(self.user, self.offset) folders = self.api.gallery_folders(self.user) return self._folder_urls(folders, "gallery", DeviantartFolderExtractor) class DeviantartAvatarExtractor(DeviantartExtractor): """Extractor for an artist's avatar""" subcategory = "avatar" archive_fmt = "a_{_username}_{index}" pattern = BASE_PATTERN + r"/avatar" example = "https://www.deviantart.com/USER/avatar/" def deviations(self): name = self.user.lower() user = _user_details(self, name) if not user: return () icon = user["usericon"] if icon == DEFAULT_AVATAR: self.log.debug("Skipping avatar of '%s' (default)", name) return () _, sep, index = icon.rpartition("?") if not sep: index = "0" formats = self.config("formats") if not formats: url = icon.replace("/avatars/", "/avatars-big/", 1) return (self._make_deviation(url, user, index, ""),) if isinstance(formats, str): formats = formats.replace(" ", "").split(",") results = [] for fmt in formats: fmt, _, ext = fmt.rpartition(".") if fmt: fmt = "-" + fmt url = (f"https://a.deviantart.net/avatars{fmt}" f"/{name[0]}/{name[1]}/{name}.{ext}?{index}") results.append(self._make_deviation(url, user, index, fmt)) return results def _make_deviation(self, url, user, index, fmt): return { "author" : user, "da_category" : "avatar", "index" : text.parse_int(index), "is_deleted" : False, "is_downloadable": False, "published_time" : 0, "title" : "avatar" + fmt, "stats" : {"comments": 0}, "content" : {"src": url}, } class DeviantartBackgroundExtractor(DeviantartExtractor): """Extractor for an artist's banner""" subcategory = "background" archive_fmt = "b_{index}" pattern = BASE_PATTERN + r"/ba(?:nner|ckground)" example = "https://www.deviantart.com/USER/banner/" def deviations(self): try: return (self.api.user_profile(self.user.lower()) ["cover_deviation"]["cover_deviation"],) except Exception: return () class DeviantartFolderExtractor(DeviantartExtractor): """Extractor for deviations inside an artist's gallery folder""" subcategory = "folder" directory_fmt = ("{category}", "{username}", "{folder[title]}") archive_fmt = "F_{folder[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/([^/?#]+)/([^/?#]+)" example = "https://www.deviantart.com/USER/gallery/12345/TITLE" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.folder = None self.folder_id = match[3] self.folder_name = match[4] def deviations(self): folders = self.api.gallery_folders(self.user) folder = self._find_folder(folders, self.folder_name, self.folder_id) # Leaving this here for backwards compatibility self.folder = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.folder_id, "owner": self.user, "parent_uuid": folder["parent"], } if folder.get("subfolder"): self.folder["parent_folder"] = folder["parent_folder"] self.archive_fmt = "F_{folder[parent_uuid]}_{index}.{extension}" if self.flat: self.directory_fmt = ("{category}", "{username}", "{folder[parent_folder]}") else: self.directory_fmt = ("{category}", "{username}", "{folder[parent_folder]}", "{folder[title]}") if folder.get("has_subfolders") and self.config("subfolders", True): for subfolder in folder["subfolders"]: subfolder["parent_folder"] = folder["name"] subfolder["subfolder"] = True yield from self._folder_urls( folder["subfolders"], "gallery", DeviantartFolderExtractor) yield from self.api.gallery(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["folder"] = self.folder class DeviantartStashExtractor(DeviantartExtractor): """Extractor for sta.sh-ed deviations""" subcategory = "stash" archive_fmt = "{index}.{extension}" pattern = (r"(?:https?://)?(?:(?:www\.)?deviantart\.com/stash|sta\.s(h))" r"/([a-z0-9]+)") example = "https://www.deviantart.com/stash/abcde" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.user = "" def deviations(self, stash_id=None, stash_data=None): if stash_id is None: legacy_url, stash_id = self.groups else: legacy_url = False if legacy_url and stash_id[0] == "2": url = "https://sta.sh/" + stash_id response = self._limited_request(url) stash_id = response.url.rpartition("/")[2] page = response.text else: url = "https://www.deviantart.com/stash/" + stash_id page = self._limited_request(url).text if stash_id[0] == "0": if uuid := text.extr(page, '//deviation/', '"'): deviation = self.api.deviation(uuid) deviation["_page"] = page deviation["index"] = text.parse_int(text.extr( page, '\\"deviationId\\":', ',')) deviation["stash_id"] = stash_id if stash_data: folder = stash_data["folder"] deviation["stash_name"] = folder["name"] deviation["stash_folder"] = folder["folderId"] deviation["stash_parent"] = folder["parentId"] or 0 deviation["stash_description"] = \ folder["richDescription"]["excerpt"] else: deviation["stash_name"] = "" deviation["stash_description"] = "" deviation["stash_folder"] = 0 deviation["stash_parent"] = 0 yield deviation return if stash_data := text.extr(page, ',\\"stash\\":', ',\\"@@'): stash_data = util.json_loads(self._unescape_json(stash_data)) for sid in text.extract_iter( page, 'href="https://www.deviantart.com/stash/', '"'): if sid == stash_id or sid.endswith("#comments"): continue yield from self.deviations(sid, stash_data) class DeviantartFavoriteExtractor(DeviantartExtractor): """Extractor for an artist's favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{username}", "Favourites") archive_fmt = "f_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites(?:/all|/?\?catpath=)?/?$" example = "https://www.deviantart.com/USER/favourites/" def deviations(self): if self.flat: return self.api.collections_all(self.user, self.offset) folders = self.api.collections_folders(self.user) return self._folder_urls( folders, "favourites", DeviantartCollectionExtractor) class DeviantartCollectionExtractor(DeviantartExtractor): """Extractor for a single favorite collection""" subcategory = "collection" directory_fmt = ("{category}", "{username}", "Favourites", "{collection[title]}") archive_fmt = "C_{collection[uuid]}_{index}.{extension}" pattern = BASE_PATTERN + r"/favourites/([^/?#]+)/([^/?#]+)" example = "https://www.deviantart.com/USER/favourites/12345/TITLE" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.collection = None self.collection_id = match[3] self.collection_name = match[4] def deviations(self): folders = self.api.collections_folders(self.user) folder = self._find_folder( folders, self.collection_name, self.collection_id) self.collection = { "title": folder["name"], "uuid" : folder["folderid"], "index": self.collection_id, "owner": self.user, } return self.api.collections(self.user, folder["folderid"], self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["collection"] = self.collection class DeviantartJournalExtractor(DeviantartExtractor): """Extractor for an artist's journals""" subcategory = "journal" directory_fmt = ("{category}", "{username}", "Journal") archive_fmt = "j_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/(?:posts(?:/journals)?|journal)/?(?:\?.*)?$" example = "https://www.deviantart.com/USER/posts/journals/" def deviations(self): return self.api.browse_user_journals(self.user, self.offset) class DeviantartStatusExtractor(DeviantartExtractor): """Extractor for an artist's status updates""" subcategory = "status" directory_fmt = ("{category}", "{username}", "Status") filename_fmt = "{category}_{index}_{title}_{date}.{extension}" archive_fmt = "S_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/posts/statuses" example = "https://www.deviantart.com/USER/posts/statuses/" def deviations(self): for status in self.api.user_statuses(self.user, self.offset): yield from self.process_status(status) def process_status(self, status): for item in status.get("items") or (): # do not trust is_share # shared deviations/statuses if "deviation" in item: yield item["deviation"].copy() if "status" in item: yield from self.process_status(item["status"].copy()) # assume is_deleted == true means necessary fields are missing if status["is_deleted"]: self.log.warning( "Skipping status %s (deleted)", status.get("statusid")) return yield status def prepare(self, deviation): if "deviationid" in deviation: return DeviantartExtractor.prepare(self, deviation) try: path = deviation["url"].split("/") deviation["index"] = text.parse_int(path[-1] or path[-2]) except KeyError: deviation["index"] = 0 if self.user: deviation["username"] = self.user deviation["_username"] = self.user.lower() else: deviation["username"] = deviation["author"]["username"] deviation["_username"] = deviation["username"].lower() deviation["date"] = dt = text.parse_datetime(deviation["ts"]) deviation["published_time"] = int(util.datetime_to_timestamp(dt)) deviation["da_category"] = "Status" deviation["category_path"] = "status" deviation["is_downloadable"] = False deviation["title"] = "Status Update" comments_count = deviation.pop("comments_count", 0) deviation["stats"] = {"comments": comments_count} if self.comments: deviation["comments"] = ( self._extract_comments(deviation["statusid"], "status") if comments_count else () ) class DeviantartTagExtractor(DeviantartExtractor): """Extractor for deviations from tag searches""" subcategory = "tag" directory_fmt = ("{category}", "Tags", "{search_tags}") archive_fmt = "T_{search_tags}_{index}.{extension}" pattern = r"(?:https?://)?www\.deviantart\.com/tag/([^/?#]+)" example = "https://www.deviantart.com/tag/TAG" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.tag = text.unquote(match[1]) self.user = "" def deviations(self): return self.api.browse_tags(self.tag, self.offset) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.tag class DeviantartWatchExtractor(DeviantartExtractor): """Extractor for Deviations from watched users""" subcategory = "watch" pattern = (r"(?:https?://)?(?:www\.)?deviantart\.com" r"/(?:watch/deviations|notifications/watch)()()") example = "https://www.deviantart.com/watch/deviations" def deviations(self): return self.api.browse_deviantsyouwatch() class DeviantartWatchPostsExtractor(DeviantartExtractor): """Extractor for Posts from watched users""" subcategory = "watch-posts" pattern = r"(?:https?://)?(?:www\.)?deviantart\.com/watch/posts()()" example = "https://www.deviantart.com/watch/posts" def deviations(self): return self.api.browse_posts_deviantsyouwatch() ############################################################################### # Eclipse ##################################################################### class DeviantartDeviationExtractor(DeviantartExtractor): """Extractor for single deviations""" subcategory = "deviation" archive_fmt = "g_{_username}_{index}.{extension}" pattern = (BASE_PATTERN + r"/(art|journal)/(?:[^/?#]+-)?(\d+)" r"|(?:https?://)?(?:www\.)?(?:fx)?deviantart\.com/" r"(?:view/|deviation/|view(?:-full)?\.php/*\?(?:[^#]+&)?id=)" r"(\d+)" # bare deviation ID without slug r"|(?:https?://)?fav\.me/d([0-9a-z]+)") # base36 example = "https://www.deviantart.com/UsER/art/TITLE-12345" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.type = match[3] self.deviation_id = \ match[4] or match[5] or id_from_base36(match[6]) def deviations(self): if self.user: url = (f"{self.root}/{self.user}" f"/{self.type or 'art'}/{self.deviation_id}") else: url = f"{self.root}/view/{self.deviation_id}/" page = self._limited_request(url, notfound="deviation").text uuid = text.extr(page, '"deviationUuid\\":\\"', '\\') if not uuid: raise exception.NotFoundError("deviation") deviation = self.api.deviation(uuid) deviation["_page"] = page deviation["index_file"] = 0 deviation["num"] = deviation["count"] = 1 additional_media = text.extr(page, ',\\"additionalMedia\\":', '}],\\"') if not additional_media: yield deviation return self.filename_fmt = ("{category}_{index}_{index_file}_{title}_" "{num:>02}.{extension}") self.archive_fmt = ("g_{_username}_{index}{index_file:?_//}." "{extension}") additional_media = util.json_loads(self._unescape_json( additional_media) + "}]") deviation["count"] = 1 + len(additional_media) yield deviation for index, post in enumerate(additional_media): uri = self._eclipse_media(post["media"], "fullview")[0] deviation["content"]["src"] = uri deviation["num"] += 1 deviation["index_file"] = post["fileId"] # Download only works on purchased materials - no way to check deviation["is_downloadable"] = False yield deviation class DeviantartScrapsExtractor(DeviantartExtractor): """Extractor for an artist's scraps""" subcategory = "scraps" directory_fmt = ("{category}", "{username}", "Scraps") archive_fmt = "s_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/(?:\?catpath=)?scraps\b" example = "https://www.deviantart.com/USER/gallery/scraps" def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) return self._eclipse_to_oauth( eclipse_api, eclipse_api.gallery_scraps(self.user, self.offset)) class DeviantartSearchExtractor(DeviantartExtractor): """Extractor for deviantart search results""" subcategory = "search" directory_fmt = ("{category}", "Search", "{search_tags}") archive_fmt = "Q_{search_tags}_{index}.{extension}" pattern = (r"(?:https?://)?www\.deviantart\.com" r"/search(?:/deviations)?/?\?([^#]+)") example = "https://www.deviantart.com/search?q=QUERY" skip = Extractor.skip def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = text.parse_query(self.user) self.search = self.query.get("q", "") self.user = "" def deviations(self): logged_in = self.login() eclipse_api = DeviantartEclipseAPI(self) search = (eclipse_api.search_deviations if logged_in else self._search_html) return self._eclipse_to_oauth(eclipse_api, search(self.query)) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search def _search_html(self, params): url = self.root + "/search" while True: response = self.request(url, params=params) if response.history and "/users/login" in response.url: raise exception.AbortExtraction("HTTP redirect to login page") page = response.text for dev in DeviantartDeviationExtractor.pattern.findall( page)[2::3]: yield { "deviationId": dev[3], "author": {"username": dev[0]}, "isJournal": dev[2] == "journal", } cursor = text.extr(page, r'\"cursor\":\"', '\\',) if not cursor: return params["cursor"] = cursor class DeviantartGallerySearchExtractor(DeviantartExtractor): """Extractor for deviantart gallery searches""" subcategory = "gallery-search" archive_fmt = "g_{_username}_{index}.{extension}" pattern = BASE_PATTERN + r"/gallery/?\?(q=[^#]+)" example = "https://www.deviantart.com/USER/gallery?q=QUERY" def __init__(self, match): DeviantartExtractor.__init__(self, match) self.query = match[3] def deviations(self): self.login() eclipse_api = DeviantartEclipseAPI(self) query = text.parse_query(self.query) self.search = query["q"] return self._eclipse_to_oauth( eclipse_api, eclipse_api.galleries_search( self.user, self.search, self.offset, query.get("sort", "most-recent"), )) def prepare(self, deviation): DeviantartExtractor.prepare(self, deviation) deviation["search_tags"] = self.search class DeviantartFollowingExtractor(DeviantartExtractor): """Extractor for user's watched users""" subcategory = "following" pattern = BASE_PATTERN + "/(?:about#)?watching" example = "https://www.deviantart.com/USER/about#watching" def items(self): api = DeviantartOAuthAPI(self) for user in api.user_friends(self.user): url = f"{self.root}/{user['user']['username']}" user["_extractor"] = DeviantartUserExtractor yield Message.Queue, url, user ############################################################################### # API Interfaces ############################################################## class DeviantartOAuthAPI(): """Interface for the DeviantArt OAuth API https://www.deviantart.com/developers/http/v1/20160316 """ CLIENT_ID = "5388" CLIENT_SECRET = "76b08c69cfb27f26d6161f9ab6d061a1" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.headers = {"dA-minor-version": "20210526"} self._warn_429 = True self.delay = extractor.config("wait-min", 0) self.delay_min = max(2, self.delay) self.mature = extractor.config("mature", "true") if not isinstance(self.mature, str): self.mature = "true" if self.mature else "false" self.strategy = extractor.config("pagination") self.folders = extractor.config("folders", False) self.public = extractor.config("public", True) if client_id := extractor.config("client-id"): self.client_id = str(client_id) self.client_secret = extractor.config("client-secret") else: self.client_id = self.CLIENT_ID self.client_secret = self.CLIENT_SECRET token = extractor.config("refresh-token") if token is None or token == "cache": token = "#" + self.client_id if not _refresh_token_cache(token): token = None self.refresh_token_key = token metadata = extractor.config("metadata", False) if not metadata: metadata = True if extractor.extra else False if metadata: self.metadata = True if isinstance(metadata, str): if metadata == "all": metadata = ("submission", "camera", "stats", "collection", "gallery") else: metadata = metadata.replace(" ", "").split(",") elif not isinstance(metadata, (list, tuple)): metadata = () self._metadata_params = {"mature_content": self.mature} self._metadata_public = None if metadata: # extended metadata self.limit = 10 for param in metadata: self._metadata_params["ext_" + param] = "1" if "ext_collection" in self._metadata_params or \ "ext_gallery" in self._metadata_params: if token: self._metadata_public = False else: self.log.error("'collection' and 'gallery' metadata " "require a refresh token") else: # base metadata self.limit = 50 else: self.metadata = False self.limit = None self.log.debug( "Using %s API credentials (client-id %s)", "default" if self.client_id == self.CLIENT_ID else "custom", self.client_id, ) def browse_deviantsyouwatch(self, offset=0): """Yield deviations from users you watch""" endpoint = "/browse/deviantsyouwatch" params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False) def browse_posts_deviantsyouwatch(self, offset=0): """Yield posts from users you watch""" endpoint = "/browse/posts/deviantsyouwatch" params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params, public=False, unpack=True) def browse_tags(self, tag, offset=0): """ Browse a tag """ endpoint = "/browse/tags" params = { "tag" : tag, "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination(endpoint, params) def browse_user_journals(self, username, offset=0): journals = filter( lambda post: "/journal/" in post["url"], self.user_profile_posts(username)) if offset: journals = util.advance(journals, offset) return journals def collections(self, username, folder_id, offset=0): """Yield all Deviation-objects contained in a collection folder""" endpoint = "/collections/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) def collections_all(self, username, offset=0): """Yield all deviations in a user's collection""" endpoint = "/collections/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def collections_folders(self, username, offset=0): """Yield all collection folders of a specific user""" endpoint = "/collections/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def comments(self, target_id, target_type="deviation", comment_id=None, offset=0): """Fetch comments posted on a target""" endpoint = f"/comments/{target_type}/{target_id}" params = { "commentid" : comment_id, "maxdepth" : "5", "offset" : offset, "limit" : 50, "mature_content": self.mature, } return self._pagination_list(endpoint, params=params, key="thread") def deviation(self, deviation_id, public=None): """Query and return info about a single Deviation""" endpoint = "/deviation/" + deviation_id deviation = self._call(endpoint, public=public) if deviation.get("is_mature") and public is None and \ self.refresh_token_key: deviation = self._call(endpoint, public=False) if self.metadata: self._metadata((deviation,)) if self.folders: self._folders((deviation,)) return deviation def deviation_content(self, deviation_id, public=None): """Get extended content of a single Deviation""" endpoint = "/deviation/content" params = {"deviationid": deviation_id} content = self._call(endpoint, params=params, public=public) if public and content["html"].startswith( ' <span class=\"username-with-symbol'): if self.refresh_token_key: content = self._call(endpoint, params=params, public=False) else: self.log.warning("Private Journal") return content def deviation_download(self, deviation_id, public=None): """Get the original file download (if allowed)""" endpoint = "/deviation/download/" + deviation_id params = {"mature_content": self.mature} try: return self._call( endpoint, params=params, public=public, log=False) except Exception: if not self.refresh_token_key: raise return self._call(endpoint, params=params, public=False) def deviation_metadata(self, deviations): """ Fetch deviation metadata for a set of deviations""" endpoint = "/deviation/metadata?" + "&".join( f"deviationids[{num}]={deviation['deviationid']}" for num, deviation in enumerate(deviations) ) return self._call( endpoint, params=self._metadata_params, public=self._metadata_public, )["metadata"] def gallery(self, username, folder_id, offset=0, extend=True, public=None): """Yield all Deviation-objects contained in a gallery folder""" endpoint = "/gallery/" + folder_id params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature, "mode": "newest"} return self._pagination(endpoint, params, extend, public) def gallery_all(self, username, offset=0): """Yield all Deviation-objects of a specific user""" endpoint = "/gallery/all" params = {"username": username, "offset": offset, "limit": 24, "mature_content": self.mature} return self._pagination(endpoint, params) @memcache(keyarg=1) def gallery_folders(self, username, offset=0): """Yield all gallery folders of a specific user""" endpoint = "/gallery/folders" params = {"username": username, "offset": offset, "limit": 50, "mature_content": self.mature} return self._pagination_list(endpoint, params) def user_friends(self, username, offset=0): """Get the users list of friends""" endpoint = "/user/friends/" + username params = {"limit": 50, "offset": offset, "mature_content": self.mature} return self._pagination(endpoint, params) def user_friends_watch(self, username): """Watch a user""" endpoint = "/user/friends/watch/" + username data = { "watch[friend]" : "0", "watch[deviations]" : "0", "watch[journals]" : "0", "watch[forum_threads]": "0", "watch[critiques]" : "0", "watch[scraps]" : "0", "watch[activity]" : "0", "watch[collections]" : "0", "mature_content" : self.mature, } return self._call( endpoint, method="POST", data=data, public=False, fatal=False, ).get("success") def user_friends_unwatch(self, username): """Unwatch a user""" endpoint = "/user/friends/unwatch/" + username return self._call( endpoint, method="POST", public=False, fatal=False, ).get("success") @memcache(keyarg=1) def user_profile(self, username): """Get user profile information""" endpoint = "/user/profile/" + username return self._call(endpoint, fatal=False) def user_profile_posts(self, username): endpoint = "/user/profile/posts" params = {"username": username, "limit": 50, "mature_content": self.mature} return self._pagination(endpoint, params) def user_statuses(self, username, offset=0): """Yield status updates of a specific user""" statuses = filter( lambda post: "/status-update/" in post["url"], self.user_profile_posts(username)) if offset: statuses = util.advance(statuses, offset) return statuses def authenticate(self, refresh_token_key): """Authenticate the application by requesting an access token""" self.headers["Authorization"] = \ self._authenticate_impl(refresh_token_key) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, refresh_token_key): """Actual authenticate implementation""" url = "https://www.deviantart.com/oauth2/token" if refresh_token_key: self.log.info("Refreshing private access token") data = {"grant_type": "refresh_token", "refresh_token": _refresh_token_cache(refresh_token_key)} else: self.log.info("Requesting public access token") data = {"grant_type": "client_credentials"} auth = util.HTTPBasicAuth(self.client_id, self.client_secret) response = self.extractor.request( url, method="POST", data=data, auth=auth, fatal=False) data = response.json() if response.status_code != 200: self.log.debug("Server response: %s", data) raise exception.AuthenticationError( f"\"{data.get('error_description')}\" ({data.get('error')})") if refresh_token_key: _refresh_token_cache.update( refresh_token_key, data["refresh_token"]) return "Bearer " + data["access_token"] def _call(self, endpoint, fatal=True, log=True, public=None, **kwargs): """Call an API endpoint""" url = "https://www.deviantart.com/api/v1/oauth2" + endpoint kwargs["fatal"] = None if public is None: public = self.public while True: if self.delay: self.extractor.sleep(self.delay, "api") self.authenticate(None if public else self.refresh_token_key) kwargs["headers"] = self.headers response = self.extractor.request(url, **kwargs) try: data = response.json() except ValueError: self.log.error("Unable to parse API response") data = {} status = response.status_code if 200 <= status < 400: if self.delay > self.delay_min: self.delay -= 1 return data if not fatal and status != 429: return None error = data.get("error_description") if error == "User not found.": raise exception.NotFoundError("user or group") if error == "Deviation not downloadable.": raise exception.AuthorizationError() self.log.debug(response.text) msg = f"API responded with {status} {response.reason}" if status == 429: if self.delay < 30: self.delay += 1 self.log.warning("%s. Using %ds delay.", msg, self.delay) if self._warn_429 and self.delay >= 3: self._warn_429 = False if self.client_id == self.CLIENT_ID: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: " "https://gdl-org.github.io/docs/configuration.html" "#extractor-deviantart-client-id-client-secret") else: if log: self.log.error(msg) return data def _should_switch_tokens(self, results, params): if len(results) < params["limit"]: return True if not self.extractor.jwt: for item in results: if item.get("is_mature"): return True return False def _pagination(self, endpoint, params, extend=True, public=None, unpack=False, key="results"): warn = True if public is None: public = self.public if self.limit and params["limit"] > self.limit: params["limit"] = (params["limit"] // self.limit) * self.limit while True: data = self._call(endpoint, params=params, public=public) try: results = data[key] except KeyError: self.log.error("Unexpected API response: %s", data) return if unpack: results = [item["journal"] for item in results if "journal" in item] if extend: if public and self._should_switch_tokens(results, params): if self.refresh_token_key: self.log.debug("Switching to private access token") public = False continue elif data["has_more"] and warn: warn = False self.log.warning( "Private or mature deviations detected! " "Run 'gallery-dl oauth:deviantart' and follow the " "instructions to be able to access them.") # "statusid" cannot be used instead if results and "deviationid" in results[0]: if self.metadata: self._metadata(results) if self.folders: self._folders(results) else: # attempt to fix "deleted" deviations for dev in self._shared_content(results): if not dev["is_deleted"]: continue patch = self._call( "/deviation/" + dev["deviationid"], fatal=False) if patch: dev.update(patch) yield from results if not data["has_more"] and ( self.strategy != "manual" or not results or not extend): return if "next_cursor" in data: if not data["next_cursor"]: return params["offset"] = None params["cursor"] = data["next_cursor"] elif data["next_offset"] is not None: params["offset"] = data["next_offset"] params["cursor"] = None else: if params.get("offset") is None: return params["offset"] = int(params["offset"]) + len(results) def _pagination_list(self, endpoint, params, key="results"): return list(self._pagination(endpoint, params, False, key=key)) def _shared_content(self, results): """Return an iterable of shared deviations in 'results'""" for result in results: for item in result.get("items") or (): if "deviation" in item: yield item["deviation"] def _metadata(self, deviations): """Add extended metadata to each deviation object""" if len(deviations) <= self.limit: self._metadata_batch(deviations) else: n = self.limit for index in range(0, len(deviations), n): self._metadata_batch(deviations[index:index+n]) def _metadata_batch(self, deviations): """Fetch extended metadata for a single batch of deviations""" for deviation, metadata in zip( deviations, self.deviation_metadata(deviations)): deviation.update(metadata) deviation["tags"] = [t["tag_name"] for t in deviation["tags"]] def _folders(self, deviations): """Add a list of all containing folders to each deviation object""" for deviation in deviations: deviation["folders"] = self._folders_map( deviation["author"]["username"])[deviation["deviationid"]] @memcache(keyarg=1) def _folders_map(self, username): """Generate a deviation_id -> folders mapping for 'username'""" self.log.info("Collecting folder information for '%s'", username) folders = self.gallery_folders(username) # create 'folderid'-to-'folder' mapping fmap = { folder["folderid"]: folder for folder in folders } # add parent names to folders, but ignore "Featured" as parent featured = folders[0]["folderid"] done = False while not done: done = True for folder in folders: parent = folder["parent"] if not parent: pass elif parent == featured: folder["parent"] = None else: parent = fmap[parent] if parent["parent"]: done = False else: folder["name"] = parent["name"] + "/" + folder["name"] folder["parent"] = None # map deviationids to folder names dmap = collections.defaultdict(list) for folder in folders: for deviation in self.gallery( username, folder["folderid"], 0, False): dmap[deviation["deviationid"]].append(folder["name"]) return dmap class DeviantartEclipseAPI(): """Interface to the DeviantArt Eclipse API""" def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.request = self.extractor._limited_request self.csrf_token = None def deviation_extended_fetch(self, deviation_id, user, kind=None): endpoint = "/_puppy/dadeviation/init" params = { "deviationid" : deviation_id, "username" : user, "type" : kind, "include_session" : "false", "expand" : "deviation.related", "da_minor_version": "20230710", } return self._call(endpoint, params) def gallery_scraps(self, user, offset=0): endpoint = "/_puppy/dashared/gallection/contents" params = { "username" : user, "type" : "gallery", "offset" : offset, "limit" : 24, "scraps_folder": "true", } return self._pagination(endpoint, params) def galleries_search(self, user, query, offset=0, order="most-recent"): endpoint = "/_puppy/dashared/gallection/search" params = { "username": user, "type" : "gallery", "order" : order, "q" : query, "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def search_deviations(self, params): endpoint = "/_puppy/dabrowse/search/deviations" return self._pagination(endpoint, params, key="deviations") def user_info(self, user, expand=False): endpoint = "/_puppy/dauserprofile/init/about" params = {"username": user} return self._call(endpoint, params) def user_watching(self, user, offset=0): gruserid, moduleid = self._ids_watching(user) endpoint = "/_puppy/gruser/module/watching" params = { "gruserid" : gruserid, "gruser_typeid": "4", "username" : user, "moduleid" : moduleid, "offset" : offset, "limit" : 24, } return self._pagination(endpoint, params) def _call(self, endpoint, params): url = "https://www.deviantart.com" + endpoint params["csrf_token"] = self.csrf_token or self._fetch_csrf_token() response = self.request(url, params=params, fatal=None) try: return response.json() except Exception: return {"error": response.text} def _pagination(self, endpoint, params, key="results"): limit = params.get("limit", 24) warn = True while True: data = self._call(endpoint, params) results = data.get(key) if results is None: return if len(results) < limit and warn and data.get("hasMore"): warn = False self.log.warning( "Private deviations detected! " "Provide login credentials or session cookies " "to be able to access them.") yield from results if not data.get("hasMore"): return if "nextCursor" in data: params["offset"] = None params["cursor"] = data["nextCursor"] elif "nextOffset" in data: params["offset"] = data["nextOffset"] params["cursor"] = None elif params.get("offset") is None: return else: params["offset"] = int(params["offset"]) + len(results) def _ids_watching(self, user): url = f"{self.extractor.root}/{user}/about" page = self.request(url).text gruser_id = text.extr(page, ' data-userid="', '"') pos = page.find('\\"name\\":\\"watching\\"') if pos < 0: raise exception.NotFoundError("'watching' module ID") module_id = text.rextr(page, '\\"id\\":', ',', pos).strip('" ') self._fetch_csrf_token(page) return gruser_id, module_id def _fetch_csrf_token(self, page=None): if page is None: page = self.request(self.extractor.root + "/").text self.csrf_token = token = text.extr( page, "window.__CSRF_TOKEN__ = '", "'") return token @memcache(keyarg=1) def _user_details(extr, name): try: return extr.api.user_profile(name)["user"] except Exception: return None @cache(maxage=36500*86400, keyarg=0) def _refresh_token_cache(token): if token and token[0] == "#": return None return token @cache(maxage=28*86400, keyarg=1) def _login_impl(extr, username, password): extr.log.info("Logging in as %s", username) url = "https://www.deviantart.com/users/login" page = extr.request(url).text data = {} for item in text.extract_iter(page, '<input type="hidden" name="', '"/>'): name, _, value = item.partition('" value="') data[name] = value challenge = data.get("challenge") if challenge and challenge != "0": extr.log.warning("Login requires solving a CAPTCHA") extr.log.debug(challenge) data["username"] = username data["password"] = password data["remember"] = "on" extr.sleep(2.0, "login") url = "https://www.deviantart.com/_sisu/do/signin" response = extr.request(url, method="POST", data=data) if not response.history: raise exception.AuthenticationError() return { cookie.name: cookie.value for cookie in extr.cookies } def id_from_base36(base36): return util.bdecode(base36, _ALPHABET) def base36_from_id(deviation_id): return util.bencode(int(deviation_id), _ALPHABET) _ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyz" ############################################################################### # Journal Formats ############################################################# SHADOW_TEMPLATE = """ <span class="shadow"> <img src="{src}" class="smshadow" width="{width}" height="{height}"> </span> <br><br> """ HEADER_TEMPLATE = """<div usr class="gr"> <div class="metadata"> <h2><a href="{url}">{title}</a></h2> <ul> <li class="author"> by <span class="name"><span class="username-with-symbol u"> <a class="u regular username" href="{userurl}">{username}</a>\ <span class="user-symbol regular"></span></span></span>, <span>{date}</span> </li> </ul> </div> """ HEADER_CUSTOM_TEMPLATE = """<div class='boxtop journaltop'> <h2> <img src="https://st.deviantart.net/minish/gruzecontrol/icons/journal.gif\ ?2" style="vertical-align:middle" alt=""/> <a href="{url}">{title}</a> </h2> Journal Entry: <span>{date}</span> """ JOURNAL_TEMPLATE_HTML = """text:<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title>{title}
    {shadow}
    {html}
    """ JOURNAL_TEMPLATE_HTML_EXTRA = """\
    \
    {}
    {}
    """ JOURNAL_TEMPLATE_TEXT = """text:{title} by {username}, {date} {content} """ ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/directlink.py0000644000175000017500000000310515040344700021236 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Direct link handling""" from .common import Extractor, Message from .. import text class DirectlinkExtractor(Extractor): """Extractor for direct links to images and other media files""" category = "directlink" filename_fmt = "{domain}/{path}/{filename}.{extension}" archive_fmt = filename_fmt pattern = (r"(?i)https?://(?P[^/?#]+)/(?P[^?#]+\." r"(?:jpe?g|jpe|png|gif|bmp|svg|web[mp]|avif|heic|psd" r"|mp4|m4v|mov|mkv|og[gmv]|wav|mp3|opus|zip|rar|7z|pdf|swf))" r"(?:\?(?P[^#]*))?(?:#(?P.*))?$") example = "https://en.wikipedia.org/static/images/project-logos/enwiki.png" def __init__(self, match): self.data = data = match.groupdict() self.subcategory = ".".join(data["domain"].rsplit(".", 2)[-2:]) Extractor.__init__(self, match) def items(self): data = self.data for key, value in data.items(): if value: data[key] = text.unquote(value) data["path"], _, name = data["path"].rpartition("/") data["filename"], _, ext = name.rpartition(".") data["extension"] = ext.lower() data["_http_headers"] = { "Referer": self.url.encode("latin-1", "ignore")} yield Message.Directory, data yield Message.Url, self.url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/discord.py0000644000175000017500000003524015040344700020542 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://discord.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?discord\.com" class DiscordExtractor(Extractor): """Base class for Discord extractors""" category = "discord" root = "https://discord.com" directory_fmt = ("{category}", "{server_id}_{server}", "{channel_id}_{channel}") filename_fmt = "{message_id}_{num:>02}_{filename}.{extension}" archive_fmt = "{message_id}_{num}" server_metadata = {} server_channels_metadata = {} def _init(self): self.token = self.config("token") self.enabled_embeds = self.config("embeds", ["image", "gifv", "video"]) self.enabled_threads = self.config("threads", True) self.api = DiscordAPI(self) def extract_message_text(self, message): text_content = [message["content"]] for embed in message["embeds"]: if embed["type"] == "rich": try: text_content.append(embed["author"]["name"]) except Exception: pass text_content.append(embed.get("title", "")) text_content.append(embed.get("description", "")) for field in embed.get("fields", []): text_content.append(field.get("name", "")) text_content.append(field.get("value", "")) try: text_content.append(embed["footer"]["text"]) except Exception: pass if message.get("poll"): text_content.append(message["poll"]["question"]["text"]) for answer in message["poll"]["answers"]: text_content.append(answer["poll_media"]["text"]) return "\n".join(t for t in text_content if t) def extract_message(self, message): # https://discord.com/developers/docs/resources/message#message-object-message-types if message["type"] in (0, 19, 21): message_metadata = {} message_metadata.update(self.server_metadata) message_metadata.update( self.server_channels_metadata[message["channel_id"]]) message_metadata.update({ "author": message["author"]["username"], "author_id": message["author"]["id"], "author_files": [], "message": self.extract_message_text(message), "message_id": message["id"], "date": text.parse_datetime( message["timestamp"], "%Y-%m-%dT%H:%M:%S.%f%z" ), "files": [] }) for icon_type, icon_path in ( ("avatar", "avatars"), ("banner", "banners") ): if message["author"].get(icon_type): message_metadata["author_files"].append({ "url": (f"https://cdn.discordapp.com/{icon_path}/" f"{message_metadata['author_id']}/" f"{message['author'][icon_type]}.png" f"?size=4096"), "filename": icon_type, "extension": "png", }) message_snapshots = [message] message_snapshots.extend( msg["message"] for msg in message.get("message_snapshots", []) if msg["message"]["type"] in (0, 19, 21) ) for snapshot in message_snapshots: for attachment in snapshot["attachments"]: message_metadata["files"].append({ "url": attachment["url"], "type": "attachment", }) for embed in snapshot["embeds"]: if embed["type"] in self.enabled_embeds: for field in ("video", "image", "thumbnail"): if field not in embed: continue url = embed[field].get("proxy_url") if url is not None: message_metadata["files"].append({ "url": url, "type": "embed", }) break for num, file in enumerate(message_metadata["files"], start=1): text.nameext_from_url(file["url"], file) file["num"] = num yield Message.Directory, message_metadata for file in message_metadata["files"]: message_metadata_file = message_metadata.copy() message_metadata_file.update(file) yield Message.Url, file["url"], message_metadata_file def extract_channel_text(self, channel_id): for message in self.api.get_channel_messages(channel_id): yield from self.extract_message(message) def extract_channel_threads(self, channel_id): for thread in self.api.get_channel_threads(channel_id): id = self.parse_channel(thread)["channel_id"] yield from self.extract_channel_text(id) def extract_channel(self, channel_id, safe=False): try: if channel_id not in self.server_channels_metadata: self.parse_channel(self.api.get_channel(channel_id)) channel_type = ( self.server_channels_metadata[channel_id]["channel_type"] ) # https://discord.com/developers/docs/resources/channel#channel-object-channel-types if channel_type in (0, 5): yield from self.extract_channel_text(channel_id) if self.enabled_threads: yield from self.extract_channel_threads(channel_id) elif channel_type in (1, 3, 10, 11, 12): yield from self.extract_channel_text(channel_id) elif channel_type in (15, 16): yield from self.extract_channel_threads(channel_id) elif channel_type in (4,): for channel in self.server_channels_metadata.copy().values(): if channel["parent_id"] == channel_id: yield from self.extract_channel( channel["channel_id"], safe=True) elif not safe: raise exception.AbortExtraction( "This channel type is not supported." ) except exception.HttpError as exc: if not (exc.status == 403 and safe): raise def parse_channel(self, channel): parent_id = channel.get("parent_id") channel_metadata = { "channel": channel.get("name", ""), "channel_id": channel.get("id"), "channel_type": channel.get("type"), "channel_topic": channel.get("topic", ""), "parent_id": parent_id, "is_thread": "thread_metadata" in channel } if parent_id in self.server_channels_metadata: parent_metadata = self.server_channels_metadata[parent_id] channel_metadata.update({ "parent": parent_metadata["channel"], "parent_type": parent_metadata["channel_type"] }) if channel_metadata["channel_type"] in (1, 3): channel_metadata.update({ "channel": "DMs", "recipients": ( [user["username"] for user in channel["recipients"]] ), "recipients_id": ( [user["id"] for user in channel["recipients"]] ) }) channel_id = channel_metadata["channel_id"] self.server_channels_metadata[channel_id] = channel_metadata return channel_metadata def parse_server(self, server): self.server_metadata = { "server": server["name"], "server_id": server["id"], "server_files": [], "owner_id": server["owner_id"] } for icon_type, icon_path in ( ("icon", "icons"), ("banner", "banners"), ("splash", "splashes"), ("discovery_splash", "discovery-splashes") ): if server.get(icon_type): self.server_metadata["server_files"].append({ "url": (f"https://cdn.discordapp.com/{icon_path}/" f"{self.server_metadata['server_id']}/" f"{server[icon_type]}.png?size=4096"), "filename": icon_type, "extension": "png", }) return self.server_metadata def build_server_and_channels(self, server_id): self.parse_server(self.api.get_server(server_id)) for channel in sorted( self.api.get_server_channels(server_id), key=lambda ch: ch["type"] != 4 ): self.parse_channel(channel) class DiscordChannelExtractor(DiscordExtractor): subcategory = "channel" pattern = BASE_PATTERN + r"/channels/(\d+)/(?:\d+/threads/)?(\d+)/?$" example = "https://discord.com/channels/1234567890/9876543210" def items(self): server_id, channel_id = self.groups self.build_server_and_channels(server_id) return self.extract_channel(channel_id) class DiscordMessageExtractor(DiscordExtractor): subcategory = "message" pattern = BASE_PATTERN + r"/channels/(\d+)/(\d+)/(\d+)/?$" example = "https://discord.com/channels/1234567890/9876543210/2468013579" def items(self): server_id, channel_id, message_id = self.groups self.build_server_and_channels(server_id) if channel_id not in self.server_channels_metadata: self.parse_channel(self.api.get_channel(channel_id)) return self.extract_message( self.api.get_message(channel_id, message_id)) class DiscordServerExtractor(DiscordExtractor): subcategory = "server" pattern = BASE_PATTERN + r"/channels/(\d+)/?$" example = "https://discord.com/channels/1234567890" def items(self): server_id = self.groups[0] self.build_server_and_channels(server_id) for channel in self.server_channels_metadata.copy().values(): if channel["channel_type"] in (0, 5, 15, 16): yield from self.extract_channel( channel["channel_id"], safe=True) class DiscordDirectMessagesExtractor(DiscordExtractor): subcategory = "direct-messages" directory_fmt = ("{category}", "Direct Messages", "{channel_id}_{recipients:J,}") pattern = BASE_PATTERN + r"/channels/@me/(\d+)/?$" example = "https://discord.com/channels/@me/1234567890" def items(self): return self.extract_channel(self.groups[0]) class DiscordDirectMessageExtractor(DiscordExtractor): subcategory = "direct-message" directory_fmt = ("{category}", "Direct Messages", "{channel_id}_{recipients:J,}") pattern = BASE_PATTERN + r"/channels/@me/(\d+)/(\d+)/?$" example = "https://discord.com/channels/@me/1234567890/9876543210" def items(self): channel_id, message_id = self.groups self.parse_channel(self.api.get_channel(channel_id)) return self.extract_message( self.api.get_message(channel_id, message_id)) class DiscordAPI(): """Interface for the Discord API v10 https://discord.com/developers/docs/reference """ def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api/v10" self.headers = {"Authorization": extractor.token} def get_server(self, server_id): """Get server information""" return self._call("/guilds/" + server_id) def get_server_channels(self, server_id): """Get server channels""" return self._call("/guilds/" + server_id + "/channels") def get_channel(self, channel_id): """Get channel information""" return self._call("/channels/" + channel_id) def get_channel_threads(self, channel_id): """Get channel threads""" THREADS_BATCH = 25 def _method(offset): return self._call("/channels/" + channel_id + "/threads/search", { "sort_by": "last_message_time", "sort_order": "desc", "limit": THREADS_BATCH, "offset": + offset, }).get("threads", []) return self._pagination(_method, THREADS_BATCH) def get_channel_messages(self, channel_id): """Get channel messages""" MESSAGES_BATCH = 100 before = None def _method(_): nonlocal before messages = self._call("/channels/" + channel_id + "/messages", { "limit": MESSAGES_BATCH, "before": before }) if messages: before = messages[-1]["id"] return messages return self._pagination(_method, MESSAGES_BATCH) def get_message(self, channel_id, message_id): """Get message information""" return self._call("/channels/" + channel_id + "/messages", { "limit": 1, "around": message_id })[0] def _call(self, endpoint, params=None): url = self.root + endpoint try: response = self.extractor.request( url, params=params, headers=self.headers) except exception.HttpError as exc: if exc.status == 401: self._raise_invalid_token() raise return response.json() def _pagination(self, method, batch): offset = 0 while True: data = method(offset) yield from data if len(data) < batch: return offset += len(data) def _raise_invalid_token(self): raise exception.AuthenticationError("""Invalid or missing token. Please provide a valid token following these instructions: 1) Open Discord in your browser (https://discord.com/app); 2) Open your browser's Developer Tools (F12) and switch to the Network panel; 3) Reload the page and select any request going to https://discord.com/api/...; 4) In the "Headers" tab, look for an entry beginning with "Authorization: "; 5) Right-click the entry and click "Copy Value"; 6) Paste the token in your configuration file under "extractor.discord.token", or run this command with the -o "token=[your token]" argument.""") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/dynastyscans.py0000644000175000017500000001405215040344700021634 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://dynasty-scans.com/""" from .common import ChapterExtractor, MangaExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?dynasty-scans\.com" class DynastyscansBase(): """Base class for dynastyscans extractors""" category = "dynastyscans" root = "https://dynasty-scans.com" def _parse_image_page(self, image_id): url = f"{self.root}/images/{image_id}" extr = text.extract_from(self.request(url).text) date = extr("class='create_at'>", "") tags = extr("class='tags'>", "") src = extr("class='btn-group'>", "") url = extr(' src="', '"') src = text.extr(src, 'href="', '"') if "Source<" in src else "" return { "url" : self.root + url, "image_id": text.parse_int(image_id), "tags" : text.split_html(tags), "date" : text.remove_html(date), "source" : text.unescape(src), } class DynastyscansChapterExtractor(DynastyscansBase, ChapterExtractor): """Extractor for manga-chapters from dynasty-scans.com""" pattern = BASE_PATTERN + r"(/chapters/[^/?#]+)" example = "https://dynasty-scans.com/chapters/NAME" def metadata(self, page): extr = text.extract_from(page) match = util.re( r"(?:]*>)?([^<]+)(?:
    )?" # manga name r"(?: ch(\d+)([^:<]*))?" # chapter info r"(?:: (.+))?" # title ).match(extr("

    ", "")) author = extr(" by ", "") group = extr('"icon-print"> ', '') return { "manga" : text.unescape(match[1]), "chapter" : text.parse_int(match[2]), "chapter_minor": match[3] or "", "title" : text.unescape(match[4] or ""), "author" : text.remove_html(author), "group" : (text.remove_html(group) or text.extr(group, ' alt="', '"')), "date" : text.parse_datetime(extr( '"icon-calendar"> ', '<'), "%b %d, %Y"), "tags" : text.split_html(extr( "class='tags'>", "") for element in root: if element.tag != "entry": continue content = element[6][0] data["author"] = content[0].text[8:] data["scanlator"] = content[1].text[11:] data["tags"] = content[2].text[6:].lower().split(", ") data["title"] = element[5].text data["date"] = text.parse_datetime( element[1].text, "%Y-%m-%dT%H:%M:%S%z") data["date_updated"] = text.parse_datetime( element[2].text, "%Y-%m-%dT%H:%M:%S%z") yield Message.Queue, element[4].text, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/e621.py0000644000175000017500000001237315040344700017572 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://e621.net/ and other e621 instances""" from .common import Extractor, Message from . import danbooru from ..cache import memcache from .. import text, util class E621Extractor(danbooru.DanbooruExtractor): """Base class for e621 extractors""" basecategory = "E621" page_limit = 750 page_start = None per_page = 320 useragent = util.USERAGENT + " (by mikf)" request_interval_min = 1.0 def items(self): if includes := self.config("metadata") or (): if isinstance(includes, str): includes = includes.split(",") elif not isinstance(includes, (list, tuple)): includes = ("notes", "pools") notes = ("notes" in includes) pools = ("pools" in includes) data = self.metadata() for post in self.posts(): file = post["file"] if not file["url"]: md5 = file["md5"] file["url"] = (f"https://static1.{self.root[8:]}/data" f"/{md5[0:2]}/{md5[2:4]}/{md5}.{file['ext']}") if notes and post.get("has_notes"): post["notes"] = self._get_notes(post["id"]) if pools and post["pools"]: post["pools"] = self._get_pools( ",".join(map(str, post["pools"]))) post["filename"] = file["md5"] post["extension"] = file["ext"] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%dT%H:%M:%S.%f%z") post.update(data) yield Message.Directory, post yield Message.Url, file["url"], post def _get_notes(self, id): return self.request_json( f"{self.root}/notes.json?search[post_id]={id}") @memcache(keyarg=1) def _get_pools(self, ids): pools = self.request_json( f"{self.root}/pools.json?search[id]={ids}") for pool in pools: pool["name"] = pool["name"].replace("_", " ") return pools BASE_PATTERN = E621Extractor.update({ "e621": { "root": "https://e621.net", "pattern": r"e621\.(?:net|cc)", }, "e926": { "root": "https://e926.net", "pattern": r"e926\.net", }, "e6ai": { "root": "https://e6ai.net", "pattern": r"e6ai\.net", }, }) class E621TagExtractor(E621Extractor, danbooru.DanbooruTagExtractor): """Extractor for e621 posts from tag searches""" pattern = BASE_PATTERN + r"/posts?(?:\?[^#]*?tags=|/index/\d+/)([^&#]*)" example = "https://e621.net/posts?tags=TAG" class E621PoolExtractor(E621Extractor, danbooru.DanbooruPoolExtractor): """Extractor for e621 pools""" pattern = BASE_PATTERN + r"/pool(?:s|/show)/(\d+)" example = "https://e621.net/pools/12345" def posts(self): self.log.info("Collecting posts of pool %s", self.pool_id) id_to_post = { post["id"]: post for post in self._pagination( "/posts.json", {"tags": "pool:" + self.pool_id}) } posts = [] for num, pid in enumerate(self.post_ids, 1): if pid in id_to_post: post = id_to_post[pid] post["num"] = num posts.append(post) else: self.log.warning("Post %s is unavailable", pid) return posts class E621PostExtractor(E621Extractor, danbooru.DanbooruPostExtractor): """Extractor for single e621 posts""" pattern = BASE_PATTERN + r"/post(?:s|/show)/(\d+)" example = "https://e621.net/posts/12345" def posts(self): url = f"{self.root}/posts/{self.groups[-1]}.json" return (self.request_json(url)["post"],) class E621PopularExtractor(E621Extractor, danbooru.DanbooruPopularExtractor): """Extractor for popular images from e621""" pattern = BASE_PATTERN + r"/explore/posts/popular(?:\?([^#]*))?" example = "https://e621.net/explore/posts/popular" def posts(self): return self._pagination("/popular.json", self.params) class E621FavoriteExtractor(E621Extractor): """Extractor for e621 favorites""" subcategory = "favorite" directory_fmt = ("{category}", "Favorites", "{user_id}") archive_fmt = "f_{user_id}_{id}" pattern = BASE_PATTERN + r"/favorites(?:\?([^#]*))?" example = "https://e621.net/favorites" def metadata(self): self.query = text.parse_query(self.groups[-1]) return {"user_id": self.query.get("user_id", "")} def posts(self): return self._pagination("/favorites.json", self.query) class E621FrontendExtractor(Extractor): """Extractor for alternative e621 frontends""" basecategory = "E621" category = "e621" subcategory = "frontend" pattern = r"(?:https?://)?e621\.(?:cc/\?tags|anthro\.fr/\?q)=([^&#]*)" example = "https://e621.cc/?tags=TAG" def initialize(self): pass def items(self): url = "https://e621.net/posts?tags=" + self.groups[0] data = {"_extractor": E621TagExtractor} yield Message.Queue, url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/erome.py0000644000175000017500000001037415040344700020223 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.erome.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import itertools BASE_PATTERN = r"(?:https?://)?(?:www\.)?erome\.com" class EromeExtractor(Extractor): category = "erome" directory_fmt = ("{category}", "{user}") filename_fmt = "{album_id} {title} {num:>02}.{extension}" archive_fmt = "{album_id}_{num}" root = "https://www.erome.com" _cookies = True def items(self): base = f"{self.root}/a/" data = {"_extractor": EromeAlbumExtractor} for album_id in self.albums(): yield Message.Queue, f"{base}{album_id}", data def albums(self): return () def request(self, url, **kwargs): if self._cookies: self._cookies = False self.cookies.update(_cookie_cache()) for _ in range(5): response = Extractor.request(self, url, **kwargs) if response.cookies: _cookie_cache.update("", response.cookies) if response.content.find( b"Please wait a few moments", 0, 600) < 0: return response self.sleep(5.0, "check") def _pagination(self, url, params): for params["page"] in itertools.count(1): page = self.request(url, params=params).text album_ids = EromeAlbumExtractor.pattern.findall(page)[::2] yield from album_ids if len(album_ids) < 36: return class EromeAlbumExtractor(EromeExtractor): """Extractor for albums on erome.com""" subcategory = "album" pattern = BASE_PATTERN + r"/a/(\w+)" example = "https://www.erome.com/a/ID" def items(self): album_id = self.groups[0] url = f"{self.root}/a/{album_id}" try: page = self.request(url).text except exception.HttpError as exc: raise exception.AbortExtraction( f"{album_id}: Unable to fetch album page ({exc})") title, pos = text.extract( page, 'property="og:title" content="', '"') pos = page.index('
    ', pos) urls = [] date = None groups = page.split('
    1: date = text.parse_timestamp(ts) data = { "album_id": album_id, "title" : text.unescape(title), "user" : text.unquote(user), "count" : len(urls), "date" : date, "tags" : ([t.replace("+", " ") for t in text.extract_iter(tags, "?q=", '"')] if tags else ()), "_http_headers": {"Referer": url}, } yield Message.Directory, data for data["num"], url in enumerate(urls, 1): yield Message.Url, url, text.nameext_from_url(url, data) class EromeUserExtractor(EromeExtractor): subcategory = "user" pattern = BASE_PATTERN + r"/(?!a/|search\?)([^/?#]+)" example = "https://www.erome.com/USER" def albums(self): url = f"{self.root}/{self.groups[0]}" return self._pagination(url, {}) class EromeSearchExtractor(EromeExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?(q=[^#]+)" example = "https://www.erome.com/search?q=QUERY" def albums(self): url = self.root + "/search" params = text.parse_query(self.groups[0]) return self._pagination(url, params) @cache() def _cookie_cache(): return () ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/everia.py0000644000175000017500000000604215040344700020364 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://everia.club""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?everia\.club" class EveriaExtractor(Extractor): category = "everia" root = "https://everia.club" def items(self): data = {"_extractor": EveriaPostExtractor} for url in self.posts(): yield Message.Queue, url, data def posts(self): return self._pagination(self.groups[0]) def _pagination(self, path, params=None, pnum=1): find_posts = util.re(r'thumbnail">\s*= 300: return yield from find_posts(response.text) pnum += 1 class EveriaPostExtractor(EveriaExtractor): subcategory = "post" directory_fmt = ("{category}", "{title}") archive_fmt = "{post_url}_{num}" pattern = BASE_PATTERN + r"(/\d{4}/\d{2}/\d{2}/[^/?#]+)" example = "https://everia.club/0000/00/00/TITLE" def items(self): url = self.root + self.groups[0] + "/" page = self.request(url).text content = text.extr(page, 'itemprop="text">', "', "', "")), "post_url": text.unquote(url), "post_category": text.extr( page, "post-in-category-", " ").capitalize(), "count": len(urls), } yield Message.Directory, data for data["num"], url in enumerate(urls, 1): url = text.unquote(url) yield Message.Url, url, text.nameext_from_url(url, data) class EveriaTagExtractor(EveriaExtractor): subcategory = "tag" pattern = BASE_PATTERN + r"(/tag/[^/?#]+)" example = "https://everia.club/tag/TAG" class EveriaCategoryExtractor(EveriaExtractor): subcategory = "category" pattern = BASE_PATTERN + r"(/category/[^/?#]+)" example = "https://everia.club/category/CATEGORY" class EveriaDateExtractor(EveriaExtractor): subcategory = "date" pattern = (BASE_PATTERN + r"(/\d{4}(?:/\d{2})?(?:/\d{2})?)(?:/page/\d+)?/?$") example = "https://everia.club/0000/00/00" class EveriaSearchExtractor(EveriaExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/(?:page/\d+/)?\?s=([^&#]+)" example = "https://everia.club/?s=SEARCH" def posts(self): params = {"s": self.groups[0]} return self._pagination("", params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/exhentai.py0000644000175000017500000005313615040344700020724 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://e-hentai.org/ and https://exhentai.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache import collections import itertools import math BASE_PATTERN = r"(?:https?://)?(e[x-]|g\.e-)hentai\.org" class ExhentaiExtractor(Extractor): """Base class for exhentai extractors""" category = "exhentai" directory_fmt = ("{category}", "{gid} {title[:247]}") filename_fmt = "{gid}_{num:>04}_{image_token}_{filename}.{extension}" archive_fmt = "{gid}_{num}" cookies_domain = ".exhentai.org" cookies_names = ("ipb_member_id", "ipb_pass_hash") root = "https://exhentai.org" request_interval = (3.0, 6.0) ciphers = "DEFAULT:!DH" LIMIT = False def __init__(self, match): Extractor.__init__(self, match) self.version = match[1] def initialize(self): domain = self.config("domain", "auto") if domain == "auto": domain = ("ex" if self.version == "ex" else "e-") + "hentai.org" self.root = "https://" + domain self.api_url = self.root + "/api.php" self.cookies_domain = "." + domain Extractor.initialize(self) if self.version != "ex": self.cookies.set("nw", "1", domain=self.cookies_domain) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if "Cache-Control" not in response.headers and not response.content: self.log.info("blank page") raise exception.AuthorizationError() return response def login(self): """Login and set necessary cookies""" if self.LIMIT: raise exception.AbortExtraction("Image limit reached!") if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) if self.version == "ex": self.log.info("No username or cookies given; using e-hentai.org") self.root = "https://e-hentai.org" self.cookies_domain = ".e-hentai.org" self.cookies.set("nw", "1", domain=self.cookies_domain) self.original = False self.limits = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = "https://forums.e-hentai.org/index.php?act=Login&CODE=01" headers = { "Referer": "https://e-hentai.org/bounce_login.php?b=d&bt=1-1", } data = { "CookieDate": "1", "b": "d", "bt": "1-1", "UserName": username, "PassWord": password, "ipb_login_submit": "Login!", } self.cookies.clear() response = self.request(url, method="POST", headers=headers, data=data) content = response.content if b"You are now logged in as:" not in content: if b"The captcha was not entered correctly" in content: raise exception.AuthenticationError( "CAPTCHA required. Use cookies instead.") raise exception.AuthenticationError() # collect more cookies url = self.root + "/favorites.php" response = self.request(url) if response.history: self.request(url) return self.cookies class ExhentaiGalleryExtractor(ExhentaiExtractor): """Extractor for image galleries from exhentai.org""" subcategory = "gallery" pattern = (BASE_PATTERN + r"(?:/g/(\d+)/([\da-f]{10})" r"|/s/([\da-f]{10})/(\d+)-(\d+))") example = "https://e-hentai.org/g/12345/67890abcde/" def __init__(self, match): ExhentaiExtractor.__init__(self, match) self.gallery_id = text.parse_int(match[2] or match[5]) self.gallery_token = match[3] self.image_token = match[4] self.image_num = text.parse_int(match[6], 1) self.key_start = None self.key_show = None self.key_next = None self.count = 0 self.data = None def _init(self): source = self.config("source") if source == "hitomi": self.items = self._items_hitomi elif source == "metadata": self.items = self._items_metadata limits = self.config("limits", False) if limits and limits.__class__ is int: self.limits = limits self._limits_remaining = 0 else: self.limits = False self.fallback_retries = self.config("fallback-retries", 2) self.original = self.config("original", True) def finalize(self): if self.data: self.log.info("Use '%s/s/%s/%s-%s' as input URL " "to continue downloading from the current position", self.root, self.data["image_token"], self.gallery_id, self.data["num"]) def favorite(self, slot="0"): url = self.root + "/gallerypopups.php" params = { "gid": self.gallery_id, "t" : self.gallery_token, "act": "addfav", } data = { "favcat" : slot, "apply" : "Apply Changes", "update" : "1", } self.request(url, method="POST", params=params, data=data) def items(self): self.login() if self.gallery_token: gpage = self._gallery_page() self.image_token = text.extr(gpage, 'hentai.org/s/', '"') if not self.image_token: self.log.debug("Page content:\n%s", gpage) raise exception.AbortExtraction( "Failed to extract initial image token") ipage = self._image_page() else: ipage = self._image_page() part = text.extr(ipage, 'hentai.org/g/', '"') if not part: self.log.debug("Page content:\n%s", ipage) raise exception.AbortExtraction( "Failed to extract gallery token") self.gallery_token = part.split("/")[1] gpage = self._gallery_page() self.data = data = self.get_metadata(gpage) self.count = text.parse_int(data["filecount"]) yield Message.Directory, data images = itertools.chain( (self.image_from_page(ipage),), self.images_from_api()) for url, image in images: data.update(image) if self.limits: self._limits_check(data) if "/fullimg" in url: data["_http_validate"] = self._validate_response else: data["_http_validate"] = None data["_http_signature"] = self._validate_signature yield Message.Url, url, data fav = self.config("fav") if fav is not None: self.favorite(fav) self.data = None def _items_hitomi(self): if self.config("metadata", False): data = self.metadata_from_api() data["date"] = text.parse_timestamp(data["posted"]) else: data = {} from .hitomi import HitomiGalleryExtractor url = f"https://hitomi.la/galleries/{self.gallery_id}.html" data["_extractor"] = HitomiGalleryExtractor yield Message.Queue, url, data def _items_metadata(self): yield Message.Directory, self.metadata_from_api() def get_metadata(self, page): """Extract gallery metadata""" data = self.metadata_from_page(page) if self.config("metadata", False): data.update(self.metadata_from_api()) data["date"] = text.parse_timestamp(data["posted"]) if self.config("tags", False): tags = collections.defaultdict(list) for tag in data["tags"]: type, _, value = tag.partition(":") tags[type].append(value) for type, values in tags.items(): data["tags_" + type] = values return data def metadata_from_page(self, page): extr = text.extract_from(page) if api_url := extr('var api_url = "', '"'): self.api_url = api_url data = { "gid" : self.gallery_id, "token" : self.gallery_token, "thumb" : extr("background:transparent url(", ")"), "title" : text.unescape(extr('

    ', '

    ')), "title_jpn" : text.unescape(extr('

    ', '

    ')), "_" : extr('
    ', '<'), "uploader" : extr('
    ', '
    '), "date" : text.parse_datetime(extr( '>Posted:

  • '), "%Y-%m-%d %H:%M"), "parent" : extr( '>Parent:
    ', 'Visible:', '<'), "language" : extr('>Language:', ' '), "filesize" : text.parse_bytes(extr( '>File Size:', '<').rstrip("Bbi")), "filecount" : extr('>Length:', ' '), "favorites" : extr('id="favcount">', ' '), "rating" : extr(">Average: ", "<"), "torrentcount" : extr('>Torrent Download (', ')'), } uploader = data["uploader"] if uploader and uploader[0] == "<": data["uploader"] = text.unescape(text.extr(uploader, ">", "<")) f = data["favorites"][0] if f == "N": data["favorites"] = "0" elif f == "O": data["favorites"] = "1" data["lang"] = util.language_to_code(data["language"]) data["tags"] = [ text.unquote(tag.replace("+", " ")) for tag in text.extract_iter(page, 'hentai.org/tag/', '"') ] return data def metadata_from_api(self): data = { "method" : "gdata", "gidlist" : ((self.gallery_id, self.gallery_token),), "namespace": 1, } data = self.request_json(self.api_url, method="POST", json=data) if "error" in data: raise exception.AbortExtraction(data["error"]) return data["gmetadata"][0] def image_from_page(self, page): """Get image url and data from webpage""" pos = page.index('
    = 0: origurl, pos = text.rextract(i6, '"', '"', pos) url = text.unescape(origurl) data = self._parse_original_info(text.extract( i6, "ownload original", "<", pos)[0]) data["_fallback"] = self._fallback_original(nl, url) else: url = imgurl data = self._parse_image_info(url) data["_fallback"] = self._fallback_1280( nl, request["page"], imgkey) except IndexError: self.log.debug("Page content:\n%s", page) raise exception.AbortExtraction( f"Unable to parse image info for '{url}'") data["num"] = request["page"] data["image_token"] = imgkey data["_url_1280"] = imgurl data["_nl"] = nl self._check_509(imgurl) yield url, text.nameext_from_url(url, data) request["imgkey"] = nextkey def _validate_response(self, response): if response.history or not response.headers.get( "content-type", "").startswith("text/html"): return True page = response.text self.log.warning("'%s'", page) if " requires GP" in page: gp = self.config("gp") if gp == "stop": raise exception.AbortExtraction("Not enough GP") elif gp == "wait": self.input("Press ENTER to continue.") return response.url self.log.info("Falling back to non-original downloads") self.original = False return self.data["_url_1280"] if " temporarily banned " in page: raise exception.AuthorizationError("Temporarily Banned") self._limits_exceeded() return response.url def _validate_signature(self, signature): """Return False if all file signature bytes are zero""" if signature: if byte := signature[0]: # 60 == b"<" if byte == 60 and b"", "").replace(",", "") self.log.debug("Image Limits: %s/%s", current, self.limits) self._limits_remaining = self.limits - text.parse_int(current) return page def _check_509(self, url): # full 509.gif URLs # - https://exhentai.org/img/509.gif # - https://ehgt.org/g/509.gif if url.endswith(("hentai.org/img/509.gif", "ehgt.org/g/509.gif")): self.log.debug(url) self._limits_exceeded() def _limits_exceeded(self): msg = "Image limit exceeded!" action = self.config("limits-action") if not action or action == "stop": ExhentaiExtractor.LIMIT = True raise exception.AbortExtraction(msg) self.log.warning(msg) if action == "wait": self.input("Press ENTER to continue.") self._limits_update() elif action == "reset": self._limits_reset() else: self.log.error("Invalid 'limits-action' value '%s'", action) def _limits_check(self, data): if not self._limits_remaining or data["num"] % 25 == 0: self._limits_update() self._limits_remaining -= data["cost"] if self._limits_remaining <= 0: self._limits_exceeded() def _limits_reset(self): self.log.info("Resetting image limits") self._request_home( method="POST", headers={"Content-Type": "application/x-www-form-urlencoded"}, data=b"reset_imagelimit=Reset+Quota") _limits_update = _request_home def _gallery_page(self): url = f"{self.root}/g/{self.gallery_id}/{self.gallery_token}/" response = self.request(url, fatal=False) page = response.text if response.status_code == 404 and "Gallery Not Available" in page: raise exception.AuthorizationError() if page.startswith(("Key missing", "Gallery not found")): raise exception.NotFoundError("gallery") if page.count("hentai.org/mpv/") > 1: self.log.warning("Enabled Multi-Page Viewer is not supported") return page def _image_page(self): url = (f"{self.root}/s/{self.image_token}" f"/{self.gallery_id}-{self.image_num}") page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): raise exception.NotFoundError("image page") return page def _fallback_original(self, nl, fullimg): url = f"{fullimg}?nl={nl}" for _ in util.repeat(self.fallback_retries): yield url def _fallback_1280(self, nl, num, token=None): if not token: token = self.key_start for _ in util.repeat(self.fallback_retries): url = f"{self.root}/s/{token}/{self.gallery_id}-{num}?nl={nl}" page = self.request(url, fatal=False).text if page.startswith(("Invalid page", "Keep trying")): return url, data = self.image_from_page(page) yield url nl = data["_nl"] def _parse_image_info(self, url): for part in url.split("/")[4:]: try: _, size, width, height, _ = part.split("-") break except ValueError: pass else: size = width = height = 0 return { "cost" : 1, "size" : text.parse_int(size), "width" : text.parse_int(width), "height": text.parse_int(height), } def _parse_original_info(self, info): parts = info.lstrip().split(" ") size = text.parse_bytes(parts[3] + parts[4][0]) return { # 1 initial point + 1 per 0.1 MB "cost" : 1 + math.ceil(size / 100000), "size" : size, "width" : text.parse_int(parts[0]), "height": text.parse_int(parts[2]), } class ExhentaiSearchExtractor(ExhentaiExtractor): """Extractor for exhentai search results""" subcategory = "search" pattern = BASE_PATTERN + r"/(?:\?([^#]*)|tag/([^/?#]+))" example = "https://e-hentai.org/?f_search=QUERY" def __init__(self, match): ExhentaiExtractor.__init__(self, match) _, query, tag = self.groups if tag: if "+" in tag: ns, _, tag = tag.rpartition(":") tag = f"{ns}:\"{tag.replace('+', ' ')}$\"" else: tag += "$" self.params = {"f_search": tag, "page": 0} else: self.params = text.parse_query(query) if "next" not in self.params: self.params["page"] = text.parse_int(self.params.get("page")) def _init(self): self.search_url = self.root def items(self): self.login() data = {"_extractor": ExhentaiGalleryExtractor} search_url = self.search_url params = self.params while True: last = None page = self.request(search_url, params=params).text for match in ExhentaiGalleryExtractor.pattern.finditer(page): url = match[0] if url == last: continue last = url data["gallery_id"] = text.parse_int(match[2]) data["gallery_token"] = match[3] yield Message.Queue, url + "/", data next_url = text.extr(page, 'nexturl="', '"', None) if next_url is not None: if not next_url: return search_url = next_url params = None elif 'class="ptdd">><' in page or ">No hits found

    " in page: return else: params["page"] += 1 class ExhentaiFavoriteExtractor(ExhentaiSearchExtractor): """Extractor for favorited exhentai galleries""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favorites\.php(?:\?([^#]*)())?" example = "https://e-hentai.org/favorites.php" def _init(self): self.search_url = self.root + "/favorites.php" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753459471.0 gallery_dl-1.30.2/gallery_dl/extractor/facebook.py0000644000175000017500000004005515040725417020674 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.facebook.com/""" from .common import Extractor, Message, Dispatch from .. import text, exception from ..cache import memcache BASE_PATTERN = r"(?:https?://)?(?:[\w-]+\.)?facebook\.com" USER_PATTERN = (BASE_PATTERN + r"/(?!media/|photo/|photo.php|watch/)" r"(?:profile\.php\?id=|people/[^/?#]+/)?([^/?&#]+)") class FacebookExtractor(Extractor): """Base class for Facebook extractors""" category = "facebook" root = "https://www.facebook.com" directory_fmt = ("{category}", "{username}", "{title} ({set_id})") filename_fmt = "{id}.{extension}" archive_fmt = "{id}.{extension}" def _init(self): headers = self.session.headers headers["Accept"] = ( "text/html,application/xhtml+xml,application/xml;q=0.9," "image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8" ) headers["Sec-Fetch-Dest"] = "empty" headers["Sec-Fetch-Mode"] = "navigate" headers["Sec-Fetch-Site"] = "same-origin" self.fallback_retries = self.config("fallback-retries", 2) self.videos = self.config("videos", True) self.author_followups = self.config("author-followups", False) def decode_all(self, txt): return text.unescape( txt.encode().decode("unicode_escape") .encode("utf_16", "surrogatepass").decode("utf_16") ).replace("\\/", "/") def parse_set_page(self, set_page): directory = { "set_id": text.extr( set_page, '"mediaSetToken":"', '"' ) or text.extr( set_page, '"mediasetToken":"', '"' ), "username": self.decode_all( text.extr( set_page, '"user":{"__isProfile":"User","name":"', '","' ) or text.extr( set_page, '"actors":[{"__typename":"User","name":"', '","' ) ), "user_id": text.extr( set_page, '"owner":{"__typename":"User","id":"', '"' ), "title": self.decode_all(text.extr( set_page, '"title":{"text":"', '"' )), "first_photo_id": text.extr( set_page, '{"__typename":"Photo","__isMedia":"Photo","', '","creation_story"' ).rsplit('"id":"', 1)[-1] or text.extr( set_page, '{"__typename":"Photo","id":"', '"' ) } return directory def parse_photo_page(self, photo_page): photo = { "id": text.extr( photo_page, '"__isNode":"Photo","id":"', '"' ), "set_id": text.extr( photo_page, '"url":"https:\\/\\/www.facebook.com\\/photo\\/?fbid=', '"' ).rsplit("&set=", 1)[-1], "username": self.decode_all(text.extr( photo_page, '"owner":{"__typename":"User","name":"', '"' )), "user_id": text.extr( photo_page, '"owner":{"__typename":"User","id":"', '"' ), "caption": self.decode_all(text.extr( photo_page, '"message":{"delight_ranges"', '"},"message_preferred_body"' ).rsplit('],"text":"', 1)[-1]), "date": text.parse_timestamp( text.extr(photo_page, '\\"publish_time\\":', ',') or text.extr(photo_page, '"created_time":', ',') ), "url": self.decode_all(text.extr( photo_page, ',"image":{"uri":"', '","' )), "next_photo_id": text.extr( photo_page, '"nextMediaAfterNodeId":{"__typename":"Photo","id":"', '"' ) or text.extr( photo_page, '"nextMedia":{"edges":[{"node":{"__typename":"Photo","id":"', '"' ) } text.nameext_from_url(photo["url"], photo) photo["followups_ids"] = [] for comment_raw in text.extract_iter( photo_page, '{"node":{"id"', '"cursor":null}' ): if ('"is_author_original_poster":true' in comment_raw and '{"__typename":"Photo","id":"' in comment_raw): photo["followups_ids"].append(text.extr( comment_raw, '{"__typename":"Photo","id":"', '"' )) return photo def parse_post_page(self, post_page): first_photo_url = text.extr( text.extr( post_page, '"__isMedia":"Photo"', '"target_group"' ), '"url":"', ',' ) post = { "set_id": text.extr(post_page, '{"mediaset_token":"', '"') or text.extr(first_photo_url, 'set=', '"').rsplit("&", 1)[0] } return post def parse_video_page(self, video_page): video = { "id": text.extr( video_page, '\\"video_id\\":\\"', '\\"' ), "username": self.decode_all(text.extr( video_page, '"actors":[{"__typename":"User","name":"', '","' )), "user_id": text.extr( video_page, '"owner":{"__typename":"User","id":"', '"' ), "date": text.parse_timestamp(text.extr( video_page, '\\"publish_time\\":', ',' )), "type": "video" } if not video["username"]: video["username"] = self.decode_all(text.extr( video_page, '"__typename":"User","id":"' + video["user_id"] + '","name":"', '","' )) first_video_raw = text.extr( video_page, '"permalink_url"', '\\/Period>\\u003C\\/MPD>' ) audio = { **video, "url": self.decode_all(text.extr( text.extr( first_video_raw, "AudioChannelConfiguration", "BaseURL>\\u003C" ), "BaseURL>", "\\u003C\\/" )), "type": "audio" } video["urls"] = {} for raw_url in text.extract_iter( first_video_raw, 'FBQualityLabel=\\"', '\\u003C\\/BaseURL>' ): resolution = raw_url.split('\\"', 1)[0] video["urls"][resolution] = self.decode_all( raw_url.split('BaseURL>', 1)[1] ) if not video["urls"]: return video, audio video["url"] = max( video["urls"].items(), key=lambda x: text.parse_int(x[0][:-1]) )[1] text.nameext_from_url(video["url"], video) audio["filename"] = video["filename"] audio["extension"] = "m4a" return video, audio def photo_page_request_wrapper(self, url, **kwargs): LEFT_OFF_TXT = "" if url.endswith("&set=") else ( "\nYou can use this URL to continue from " "where you left off (added \"&setextract\"): " "\n" + url + "&setextract" ) res = self.request(url, **kwargs) if res.url.startswith(self.root + "/login"): raise exception.AuthRequired( message=(f"You must be logged in to continue viewing images." f"{LEFT_OFF_TXT}") ) if b'{"__dr":"CometErrorRoot.react"}' in res.content: raise exception.AbortExtraction( f"You've been temporarily blocked from viewing images.\n" f"Please try using a different account, " f"using a VPN or waiting before you retry.{LEFT_OFF_TXT}" ) return res def extract_set(self, set_data): set_id = set_data["set_id"] all_photo_ids = [set_data["first_photo_id"]] retries = 0 i = 0 while i < len(all_photo_ids): photo_id = all_photo_ids[i] photo_url = f"{self.root}/photo/?fbid={photo_id}&set={set_id}" photo_page = self.photo_page_request_wrapper(photo_url).text photo = self.parse_photo_page(photo_page) photo["num"] = i + 1 if self.author_followups: for followup_id in photo["followups_ids"]: if followup_id not in all_photo_ids: self.log.debug( "Found a followup in comments: %s", followup_id ) all_photo_ids.append(followup_id) if not photo["url"]: if retries < self.fallback_retries and self._interval_429: seconds = self._interval_429() self.log.warning( "Failed to find photo download URL for %s. " "Retrying in %s seconds.", photo_url, seconds, ) self.wait(seconds=seconds, reason="429 Too Many Requests") retries += 1 continue else: self.log.error( "Failed to find photo download URL for " + photo_url + ". Skipping." ) retries = 0 else: retries = 0 photo.update(set_data) yield Message.Directory, photo yield Message.Url, photo["url"], photo if not photo["next_photo_id"]: self.log.debug( "Can't find next image in the set. " "Extraction is over." ) elif photo["next_photo_id"] in all_photo_ids: if photo["next_photo_id"] != photo["id"]: self.log.debug( "Detected a loop in the set, it's likely finished. " "Extraction is over." ) else: all_photo_ids.append(photo["next_photo_id"]) i += 1 @memcache(keyarg=1) def _extract_profile_photos_page(self, profile): profile_photos_url = f"{self.root}/{profile}/photos_by" for _ in range(self.fallback_retries + 1): profile_photos_page = self.request(profile_photos_url).text if set_id := self._extract_profile_set_id(profile_photos_page): break self.log.debug("Got empty profile photos page, retrying...") else: raise exception.AbortExtraction("Failed to extract profile data") avatar_page_url = text.extr( profile_photos_page, ',"profilePhoto":{"url":"', '"') return set_id, avatar_page_url.replace("\\/", "/") def _extract_profile_set_id(self, profile_photos_page): set_ids_raw = text.extr( profile_photos_page, '"pageItems"', '"page_info"' ) set_id = text.extr( set_ids_raw, 'set=', '"' ).rsplit("&", 1)[0] or text.extr( set_ids_raw, '\\/photos\\/', '\\/' ) return set_id class FacebookSetExtractor(FacebookExtractor): """Base class for Facebook Set extractors""" subcategory = "set" pattern = ( BASE_PATTERN + r"/(?:(?:media/set|photo)/?\?(?:[^&#]+&)*set=([^&#]+)" r"[^/?#]*(? %s)", item["id"], item["feeRequired"], fee_max) continue try: url = "https://api.fanbox.cc/post.info?postId=" + item["id"] body = self.request_json(url, headers=self.headers)["body"] content_body, post = self._extract_post(body) except Exception as exc: self.log.warning("Skipping post %s (%s: %s)", item["id"], exc.__class__.__name__, exc) continue yield Message.Directory, post yield from self._get_urls_from_post(content_body, post) def posts(self): """Return all relevant post objects""" def _pagination(self, url): while url: url = text.ensure_http_scheme(url) body = self.request_json(url, headers=self.headers)["body"] yield from body["items"] url = body["nextUrl"] def _extract_post(self, post): """Fetch and process post data""" post["archives"] = () if content_body := post.pop("body", None): if "html" in content_body: post["html"] = content_body["html"] if post["type"] == "article": post["articleBody"] = content_body.copy() if "blocks" in content_body: content = [] # text content images = [] # image IDs in 'body' order files = [] # file IDs in 'body' order for block in content_body["blocks"]: if "text" in block: content.append(block["text"]) if "links" in block: for link in block["links"]: content.append(link["url"]) if "imageId" in block: images.append(block["imageId"]) if "fileId" in block: files.append(block["fileId"]) post["content"] = "\n".join(content) self._sort_map(content_body, "imageMap", images) if file_map := self._sort_map(content_body, "fileMap", files): exts = util.EXTS_ARCHIVE post["archives"] = [ file for file in file_map.values() if file.get("extension", "").lower() in exts ] post["date"] = text.parse_datetime(post["publishedDatetime"]) post["text"] = content_body.get("text") if content_body else None post["isCoverImage"] = False if self._meta_user: post["user"] = self._get_user_data(post["creatorId"]) if self._meta_plan: plans = self._get_plan_data(post["creatorId"]) fee = post["feeRequired"] try: post["plan"] = plans[fee] except KeyError: if fees := [f for f in plans if f >= fee]: plan = plans[min(fees)] else: plan = plans[0].copy() plan["fee"] = fee post["plan"] = plans[fee] = plan if self._meta_comments: if post["commentCount"]: post["comments"] = list(self._get_comment_data(post["id"])) else: post["commentd"] = () return content_body, post def _sort_map(self, body, key, ids): orig = body.get(key) if not orig: return {} if orig is None else orig body[key] = new = { id: orig[id] for id in ids if id in orig } return new @memcache(keyarg=1) def _get_user_data(self, creator_id): url = "https://api.fanbox.cc/creator.get" params = {"creatorId": creator_id} data = self.request_json(url, params=params, headers=self.headers) user = data["body"] user.update(user.pop("user")) return user @memcache(keyarg=1) def _get_plan_data(self, creator_id): url = "https://api.fanbox.cc/plan.listCreator" params = {"creatorId": creator_id} data = self.request_json(url, params=params, headers=self.headers) plans = {0: { "id" : "", "title" : "", "fee" : 0, "description" : "", "coverImageUrl" : "", "creatorId" : creator_id, "hasAdultContent": None, "paymentMethod" : None, }} for plan in data["body"]: del plan["user"] plans[plan["fee"]] = plan return plans def _get_comment_data(self, post_id): url = ("https://api.fanbox.cc/post.getComments" "?limit=10&postId=" + post_id) comments = [] while url: url = text.ensure_http_scheme(url) body = self.request_json(url, headers=self.headers)["body"] data = body["commentList"] comments.extend(data["items"]) url = data["nextUrl"] return comments def _get_urls_from_post(self, content_body, post): num = 0 if cover_image := post.get("coverImageUrl"): cover_image = util.re("/c/[0-9a-z_]+").sub("", cover_image) final_post = post.copy() final_post["isCoverImage"] = True final_post["fileUrl"] = cover_image text.nameext_from_url(cover_image, final_post) final_post["num"] = num num += 1 yield Message.Url, cover_image, final_post if not content_body: return if "html" in content_body: html_urls = [] for href in text.extract_iter(content_body["html"], 'href="', '"'): if "fanbox.pixiv.net/images/entry" in href: html_urls.append(href) elif "downloads.fanbox.cc" in href: html_urls.append(href) for src in text.extract_iter(content_body["html"], 'data-src-original="', '"'): html_urls.append(src) for url in html_urls: final_post = post.copy() text.nameext_from_url(url, final_post) final_post["fileUrl"] = url final_post["num"] = num num += 1 yield Message.Url, url, final_post for group in ("images", "imageMap"): if group in content_body: for item in content_body[group]: if group == "imageMap": # imageMap is a dict with image objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["originalUrl"] text.nameext_from_url(item["originalUrl"], final_post) if "extension" in item: final_post["extension"] = item["extension"] final_post["fileId"] = item.get("id") final_post["width"] = item.get("width") final_post["height"] = item.get("height") final_post["num"] = num num += 1 yield Message.Url, item["originalUrl"], final_post for group in ("files", "fileMap"): if group in content_body: for item in content_body[group]: if group == "fileMap": # fileMap is a dict with file objects as values item = content_body[group][item] final_post = post.copy() final_post["fileUrl"] = item["url"] text.nameext_from_url(item["url"], final_post) if "extension" in item: final_post["extension"] = item["extension"] if "name" in item: final_post["filename"] = item["name"] final_post["fileId"] = item.get("id") final_post["num"] = num num += 1 yield Message.Url, item["url"], final_post if self.embeds: embeds_found = [] if "video" in content_body: embeds_found.append(content_body["video"]) embeds_found.extend(content_body.get("embedMap", {}).values()) for embed in embeds_found: # embed_result is (message type, url, metadata dict) embed_result = self._process_embed(post, embed) if not embed_result: continue embed_result[2]["num"] = num num += 1 yield embed_result def _process_embed(self, post, embed): final_post = post.copy() provider = embed["serviceProvider"] content_id = embed.get("videoId") or embed.get("contentId") prefix = "ytdl:" if self.embeds == "ytdl" else "" url = None is_video = False if provider == "soundcloud": url = prefix+"https://soundcloud.com/"+content_id is_video = True elif provider == "youtube": url = prefix+"https://youtube.com/watch?v="+content_id is_video = True elif provider == "vimeo": url = prefix+"https://vimeo.com/"+content_id is_video = True elif provider == "fanbox": # this is an old URL format that redirects # to a proper Fanbox URL url = "https://www.pixiv.net/fanbox/"+content_id # resolve redirect try: url = self.request_location(url) except Exception as exc: url = None self.log.warning("Unable to extract fanbox embed %s (%s: %s)", content_id, exc.__class__.__name__, exc) else: final_post["_extractor"] = FanboxPostExtractor elif provider == "twitter": url = "https://twitter.com/_/status/"+content_id elif provider == "google_forms": url = (f"https://docs.google.com/forms/d/e/" f"{content_id}/viewform?usp=sf_link") else: self.log.warning(f"service not recognized: {provider}") if url: final_post["embed"] = embed final_post["embedUrl"] = url text.nameext_from_url(url, final_post) msg_type = Message.Queue if is_video and self.embeds == "ytdl": msg_type = Message.Url return msg_type, url, final_post class FanboxCreatorExtractor(FanboxExtractor): """Extractor for a Fanbox creator's works""" subcategory = "creator" pattern = USER_PATTERN + r"(?:/posts)?/?$" example = "https://USER.fanbox.cc/" def posts(self): url = "https://api.fanbox.cc/post.paginateCreator?creatorId=" creator_id = self.groups[0] or self.groups[1] return self._pagination_creator(url + creator_id) def _pagination_creator(self, url): urls = self.request_json(url, headers=self.headers)["body"] for url in urls: url = text.ensure_http_scheme(url) yield from self.request_json(url, headers=self.headers)["body"] class FanboxPostExtractor(FanboxExtractor): """Extractor for media from a single Fanbox post""" subcategory = "post" pattern = USER_PATTERN + r"/posts/(\d+)" example = "https://USER.fanbox.cc/posts/12345" def posts(self): return ({"id": self.groups[2], "feeRequired": 0},) class FanboxHomeExtractor(FanboxExtractor): """Extractor for your Fanbox home feed""" subcategory = "home" pattern = BASE_PATTERN + r"/?$" example = "https://fanbox.cc/" def posts(self): url = "https://api.fanbox.cc/post.listHome?limit=10" return self._pagination(url) class FanboxSupportingExtractor(FanboxExtractor): """Extractor for your supported Fanbox users feed""" subcategory = "supporting" pattern = BASE_PATTERN + r"/home/supporting" example = "https://fanbox.cc/home/supporting" def posts(self): url = "https://api.fanbox.cc/post.listSupporting?limit=10" return self._pagination(url) class FanboxRedirectExtractor(Extractor): """Extractor for pixiv redirects to fanbox.cc""" category = "fanbox" subcategory = "redirect" pattern = r"(?:https?://)?(?:www\.)?pixiv\.net/fanbox/creator/(\d+)" example = "https://www.pixiv.net/fanbox/creator/12345" def items(self): url = "https://www.pixiv.net/fanbox/creator/" + self.groups[0] location = self.request_location(url, notfound="user") yield Message.Queue, location, {"_extractor": FanboxCreatorExtractor} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/fantia.py0000644000175000017500000001550015040344700020352 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fantia.jp/""" from .common import Extractor, Message from .. import text, util class FantiaExtractor(Extractor): """Base class for Fantia extractors""" category = "fantia" root = "https://fantia.jp" directory_fmt = ("{category}", "{fanclub_id}") filename_fmt = "{post_id}_{file_id}.{extension}" archive_fmt = "{post_id}_{file_id}" _warning = True def _init(self): self.headers = { "Accept" : "application/json, text/plain, */*", "X-Requested-With": "XMLHttpRequest", } self._empty_plan = { "id" : 0, "price": 0, "limit": 0, "name" : "", "description": "", "thumb": self.root + "/images/fallback/plan/thumb_default.png", } if self._warning: if not self.cookies_check(("_session_id",)): self.log.warning("no '_session_id' cookie set") FantiaExtractor._warning = False def items(self): for post_id in self.posts(): post = self._get_post_data(post_id) post["num"] = 0 contents = self._get_post_contents(post) post["content_count"] = len(contents) post["content_num"] = 0 for content in contents: files = self._process_content(post, content) yield Message.Directory, post if content["visible_status"] != "visible": self.log.warning( "Unable to download '%s' files from " "%s#post-content-id-%s", content["visible_status"], post["post_url"], content["id"]) for file in files: post.update(file) post["num"] += 1 text.nameext_from_url( post["content_filename"] or file["file_url"], post) yield Message.Url, file["file_url"], post post["content_num"] += 1 def posts(self): """Return post IDs""" def _pagination(self, url): params = {"page": 1} while True: page = self.request(url, params=params).text self._csrf_token(page) post_id = None for post_id in text.extract_iter( page, 'class="link-block" href="/posts/', '"'): yield post_id if not post_id: return params["page"] += 1 def _csrf_token(self, page=None): if not page: page = self.request(self.root + "/").text self.headers["X-CSRF-Token"] = text.extr( page, 'name="csrf-token" content="', '"') def _get_post_data(self, post_id): """Fetch and process post data""" url = self.root+"/api/v1/posts/"+post_id resp = self.request_json(url, headers=self.headers)["post"] return { "post_id": resp["id"], "post_url": self.root + "/posts/" + str(resp["id"]), "post_title": resp["title"], "comment": resp["comment"], "rating": resp["rating"], "posted_at": resp["posted_at"], "date": text.parse_datetime( resp["posted_at"], "%a, %d %b %Y %H:%M:%S %z"), "fanclub_id": resp["fanclub"]["id"], "fanclub_user_id": resp["fanclub"]["user"]["id"], "fanclub_user_name": resp["fanclub"]["user"]["name"], "fanclub_name": resp["fanclub"]["name"], "fanclub_url": self.root+"/fanclubs/"+str(resp["fanclub"]["id"]), "tags": [t["name"] for t in resp["tags"]], "_data": resp, } def _get_post_contents(self, post): contents = post["_data"]["post_contents"] try: url = post["_data"]["thumb"]["original"] except Exception: pass else: contents.insert(0, { "id": "thumb", "title": "thumb", "category": "thumb", "download_uri": url, "visible_status": "visible", "plan": None, }) return contents def _process_content(self, post, content): post["content_category"] = content["category"] post["content_title"] = content["title"] post["content_filename"] = content.get("filename") or "" post["content_id"] = content["id"] post["content_comment"] = content.get("comment") or "" post["content_num"] += 1 post["plan"] = content["plan"] or self._empty_plan files = [] if "post_content_photos" in content: for photo in content["post_content_photos"]: files.append({"file_id" : photo["id"], "file_url": photo["url"]["original"]}) if "download_uri" in content: url = content["download_uri"] if url[0] == "/": url = self.root + url files.append({"file_id" : content["id"], "file_url": url}) if content["category"] == "blog" and "comment" in content: comment_json = util.json_loads(content["comment"]) blog_text = "" for op in comment_json.get("ops") or (): insert = op.get("insert") if isinstance(insert, str): blog_text += insert elif isinstance(insert, dict) and "fantiaImage" in insert: img = insert["fantiaImage"] files.append({"file_id" : img["id"], "file_url": self.root + img["original_url"]}) post["blogpost_text"] = blog_text else: post["blogpost_text"] = "" return files class FantiaCreatorExtractor(FantiaExtractor): """Extractor for a Fantia creator's works""" subcategory = "creator" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/fanclubs/(\d+)" example = "https://fantia.jp/fanclubs/12345" def __init__(self, match): FantiaExtractor.__init__(self, match) self.creator_id = match[1] def posts(self): url = f"{self.root}/fanclubs/{self.creator_id}/posts" return self._pagination(url) class FantiaPostExtractor(FantiaExtractor): """Extractor for media from a single Fantia post""" subcategory = "post" pattern = r"(?:https?://)?(?:www\.)?fantia\.jp/posts/(\d+)" example = "https://fantia.jp/posts/12345" def __init__(self, match): FantiaExtractor.__init__(self, match) self.post_id = match[1] def posts(self): self._csrf_token() return (self.post_id,) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/fapachi.py0000644000175000017500000000434315040344700020506 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapachi.com/""" from .common import Extractor, Message from .. import text class FapachiPostExtractor(Extractor): """Extractor for individual posts on fapachi.com""" category = "fapachi" subcategory = "post" root = "https://fapachi.com" directory_fmt = ("{category}", "{user}") filename_fmt = "{user}_{id}.{extension}" archive_fmt = "{user}_{id}" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search/)([^/?#]+)/media/(\d+)") example = "https://fapachi.com/MODEL/media/12345" def __init__(self, match): Extractor.__init__(self, match) self.user, self.id = match.groups() def items(self): data = { "user": self.user, "id" : self.id, } page = self.request(f"{self.root}/{self.user}/media/{self.id}").text url = self.root + text.extract( page, 'data-src="', '"', page.index('class="media-img'))[0] yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapachiUserExtractor(Extractor): """Extractor for all posts from a fapachi user""" category = "fapachi" subcategory = "user" root = "https://fapachi.com" pattern = (r"(?:https?://)?(?:www\.)?fapachi\.com" r"/(?!search(?:/|$))([^/?#]+)(?:/page/(\d+))?$") example = "https://fapachi.com/MODEL" def __init__(self, match): Extractor.__init__(self, match) self.user = match[1] self.num = text.parse_int(match[2], 1) def items(self): data = {"_extractor": FapachiPostExtractor} while True: url = f"{self.root}/{self.user}/page/{self.num}" page = self.request(url).text for post in text.extract_iter(page, 'model-media-prew">', ">"): if path := text.extr(post, '
    Next page' not in page: return self.num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/fapello.py0000644000175000017500000000766515040344700020547 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fapello.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.)?fapello\.(?:com|su)" class FapelloPostExtractor(Extractor): """Extractor for individual posts on fapello.com""" category = "fapello" subcategory = "post" directory_fmt = ("{category}", "{model}") filename_fmt = "{model}_{id}.{extension}" archive_fmt = "{type}_{model}_{id}" pattern = BASE_PATTERN + r"/(?!search/|popular_videos/)([^/?#]+)/(\d+)" example = "https://fapello.com/MODEL/12345/" def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match[0]) self.model, self.id = match.groups() def items(self): url = f"{self.root}/{self.model}/{self.id}/" page = text.extr( self.request(url, allow_redirects=False).text, 'class="uk-align-center"', "
    ", None) if page is None: raise exception.NotFoundError("post") data = { "model": self.model, "id" : text.parse_int(self.id), "type" : "video" if 'type="video' in page else "photo", "thumbnail": text.extr(page, 'poster="', '"'), } url = text.extr(page, 'src="', '"').replace( ".md", "").replace(".th", "") yield Message.Directory, data yield Message.Url, url, text.nameext_from_url(url, data) class FapelloModelExtractor(Extractor): """Extractor for all posts from a fapello model""" category = "fapello" subcategory = "model" pattern = (BASE_PATTERN + r"/(?!top-(?:likes|followers)|popular_videos" r"|videos|trending|search/?$)" r"([^/?#]+)/?$") example = "https://fapello.com/model/" def __init__(self, match): Extractor.__init__(self, match) self.root = text.root_from_url(match[0]) self.model = match[1] def items(self): num = 1 data = {"_extractor": FapelloPostExtractor} while True: url = f"{self.root}/ajax/model/{self.model}/page-{num}/" page = self.request(url).text if not page: return url = None for url in text.extract_iter(page, '', ""): yield Message.Queue, text.extr(item, ' 0 and (int(size["width"]) > self.maxsize or int(size["height"]) > self.maxsize): del sizes[index:] break return sizes def photos_search(self, params): """Return a list of photos matching some criteria.""" return self._pagination("photos.search", params.copy()) def photosets_getInfo(self, photoset_id, user_id): """Gets information about a photoset.""" params = {"photoset_id": photoset_id, "user_id": user_id} photoset = self._call("photosets.getInfo", params)["photoset"] return self._clean_info(photoset) def photosets_getList(self, user_id): """Returns the photosets belonging to the specified user.""" params = {"user_id": user_id} return self._pagination_sets("photosets.getList", params) def photosets_getPhotos(self, photoset_id): """Get the list of photos in a set.""" params = {"photoset_id": photoset_id} return self._pagination("photosets.getPhotos", params, "photoset") def urls_lookupGroup(self, groupname): """Returns a group NSID, given the url to a group's page.""" params = {"url": "https://www.flickr.com/groups/" + groupname} group = self._call("urls.lookupGroup", params)["group"] return {"nsid": group["id"], "path_alias": groupname, "groupname": group["groupname"]["_content"]} def urls_lookupUser(self, username): """Returns a user NSID, given the url to a user's photos or profile.""" params = {"url": "https://www.flickr.com/photos/" + username} user = self._call("urls.lookupUser", params)["user"] return { "nsid" : user["id"], "username" : user["username"]["_content"], "path_alias": username, } def video_getStreamInfo(self, video_id, secret=None): """Returns all available video streams""" params = {"photo_id": video_id} if not secret: secret = self._call("photos.getInfo", params)["photo"]["secret"] params["secret"] = secret stream = self._call("video.getStreamInfo", params)["streams"]["stream"] return max(stream, key=lambda s: self.VIDEO_FORMATS.get(s["type"], 0)) def _call(self, method, params): params["method"] = "flickr." + method params["format"] = "json" params["nojsoncallback"] = "1" if self.api_key: params["api_key"] = self.api_key response = self.request(self.API_URL, params=params) try: data = response.json() except ValueError: data = {"code": -1, "message": response.content} if "code" in data: msg = data.get("message") self.log.debug("Server response: %s", data) if data["code"] == 1: raise exception.NotFoundError(self.extractor.subcategory) elif data["code"] == 2: raise exception.AuthorizationError(msg) elif data["code"] == 98: raise exception.AuthenticationError(msg) elif data["code"] == 99: raise exception.AuthorizationError(msg) raise exception.AbortExtraction(f"API request failed: {msg}") return data def _pagination(self, method, params, key="photos"): extras = ("description,date_upload,tags,views,media," "path_alias,owner_name,") if includes := self.extractor.config("metadata"): if isinstance(includes, (list, tuple)): includes = ",".join(includes) elif not isinstance(includes, str): includes = ("license,date_taken,original_format,last_update," "geo,machine_tags,o_dims") extras = extras + includes + "," extras += ",".join("url_" + fmt[0] for fmt in self.formats) params["extras"] = extras params["page"] = 1 while True: data = self._call(method, params)[key] yield from data["photo"] if params["page"] >= data["pages"]: return params["page"] += 1 def _pagination_sets(self, method, params): params["page"] = 1 while True: data = self._call(method, params)["photosets"] yield from data["photoset"] if params["page"] >= data["pages"]: return params["page"] += 1 def _extract_format(self, photo): photo["description"] = photo["description"]["_content"].strip() photo["views"] = text.parse_int(photo["views"]) photo["date"] = text.parse_timestamp(photo["dateupload"]) photo["tags"] = photo["tags"].split() self._extract_metadata(photo) photo["id"] = text.parse_int(photo["id"]) if "owner" not in photo: photo["owner"] = self.extractor.user elif not self.meta_info: photo["owner"] = { "nsid" : photo["owner"], "username" : photo["ownername"], "path_alias": photo["pathalias"], } del photo["pathalias"] del photo["ownername"] if photo["media"] == "video" and self.videos: return self._extract_video(photo) for fmt, fmtname, fmtwidth in self.formats: key = "url_" + fmt if key in photo: photo["width"] = text.parse_int(photo["width_" + fmt]) photo["height"] = text.parse_int(photo["height_" + fmt]) if self.maxsize and (photo["width"] > self.maxsize or photo["height"] > self.maxsize): continue photo["url"] = photo[key] photo["label"] = fmtname # remove excess data keys = [ key for key in photo if key.startswith(("url_", "width_", "height_")) ] for key in keys: del photo[key] break else: self._extract_photo(photo) return photo def _extract_photo(self, photo): size = self.photos_getSizes(photo["id"])[-1] photo["url"] = size["source"] photo["label"] = size["label"] photo["width"] = text.parse_int(size["width"]) photo["height"] = text.parse_int(size["height"]) return photo def _extract_video(self, photo): stream = self.video_getStreamInfo(photo["id"], photo.get("secret")) photo["url"] = stream["_content"] photo["label"] = stream["type"] photo["width"] = photo["height"] = 0 return photo def _extract_metadata(self, photo, info=True): if info and self.meta_info: try: photo.update(self.photos_getInfo(photo["id"])) photo["title"] = photo["title"]["_content"] photo["comments"] = text.parse_int( photo["comments"]["_content"]) photo["description"] = photo["description"]["_content"] photo["tags"] = [t["raw"] for t in photo["tags"]["tag"]] photo["views"] = text.parse_int(photo["views"]) photo["id"] = text.parse_int(photo["id"]) except Exception as exc: self.log.warning( "Unable to retrieve 'info' data for %s (%s: %s)", photo["id"], exc.__class__.__name__, exc) if self.meta_exif: try: photo.update(self.photos_getExif(photo["id"])) except Exception as exc: self.log.warning( "Unable to retrieve 'exif' data for %s (%s: %s)", photo["id"], exc.__class__.__name__, exc) if self.meta_contexts: try: photo.update(self.photos_getAllContexts(photo["id"])) except Exception as exc: self.log.warning( "Unable to retrieve 'contexts' data for %s (%s: %s)", photo["id"], exc.__class__.__name__, exc) if "license" in photo: photo["license_name"] = self.LICENSES.get(photo["license"]) def _clean_info(self, info): info["title"] = info["title"]["_content"] info["description"] = info["description"]["_content"] return info ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/foolfuuka.py0000644000175000017500000002204115040344700021101 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoolFuuka 4chan archives""" from .common import BaseExtractor, Message from .. import text import itertools class FoolfuukaExtractor(BaseExtractor): """Base extractor for FoolFuuka based boards/archives""" basecategory = "foolfuuka" filename_fmt = "{timestamp_ms} {filename_media}.{extension}" archive_fmt = "{board[shortname]}_{num}_{timestamp}" external = "default" def __init__(self, match): BaseExtractor.__init__(self, match) if self.category == "b4k": self.remote = self._remote_direct elif self.category == "archivedmoe": self.referer = False self.fixup_redirect = True else: self.fixup_redirect = False def items(self): yield Message.Directory, self.metadata() for post in self.posts(): media = post["media"] if not media: continue url = media["media_link"] if not url and "remote_media_link" in media: url = self.remote(media) if url and url[0] == "/": url = self.root + url post["filename"], _, post["extension"] = \ media["media"].rpartition(".") post["filename_media"] = media["media_filename"].rpartition(".")[0] post["timestamp_ms"] = text.parse_int( media["media_orig"].rpartition(".")[0]) yield Message.Url, url, post def metadata(self): """Return general metadata""" def posts(self): """Return an iterable with all relevant posts""" def remote(self, media): """Resolve a remote media link""" page = self.request(media["remote_media_link"]).text url = text.extr(page, 'http-equiv="Refresh" content="0; url=', '"') if url.startswith("https://thebarchive.com/"): # '.webm' -> '.web' (#5116) if url.endswith(".webm"): url = url[:-1] elif self.fixup_redirect: # update redirect domain or filename (#7652) path, _, filename = url.rpartition("/") # these boards link directly to i.4cdn.org # -> redirect to warosu or 4plebs instead board_domains = { "3" : "warosu.org", "biz": "warosu.org", "ck" : "warosu.org", "diy": "warosu.org", "fa" : "warosu.org", "ic" : "warosu.org", "jp" : "warosu.org", "lit": "warosu.org", "sci": "warosu.org", "tg" : "archive.4plebs.org", } board = url.split("/", 4)[3] if board in board_domains: domain = board_domains[board] url = f"https://{domain}/{board}/full_image/{filename}" # if it's one of these archives, slice the name elif any(archive in path for archive in ( "b4k.", "desuarchive.", "palanq.")): name, _, ext = filename.rpartition(".") if len(name) > 13: url = f"{path}/{name[:13]}.{ext}" return url def _remote_direct(self, media): return media["remote_media_link"] BASE_PATTERN = FoolfuukaExtractor.update({ "4plebs": { "root": "https://archive.4plebs.org", "pattern": r"(?:archive\.)?4plebs\.org", }, "archivedmoe": { "root": "https://archived.moe", "pattern": r"archived\.moe", }, "archiveofsins": { "root": "https://archiveofsins.com", "pattern": r"(?:www\.)?archiveofsins\.com", }, "b4k": { "root": "https://arch.b4k.dev", "pattern": r"arch\.b4k\.(?:dev|co)", }, "desuarchive": { "root": "https://desuarchive.org", "pattern": r"desuarchive\.org", }, "fireden": { "root": "https://boards.fireden.net", "pattern": r"boards\.fireden\.net", }, "palanq": { "root": "https://archive.palanq.win", "pattern": r"archive\.palanq\.win", }, "rbt": { "root": "https://rbt.asia", "pattern": r"(?:rbt\.asia|(?:archive\.)?rebeccablacktech\.com)", }, "thebarchive": { "root": "https://thebarchive.com", "pattern": r"thebarchive\.com", }, }) class FoolfuukaThreadExtractor(FoolfuukaExtractor): """Base extractor for threads on FoolFuuka based boards/archives""" subcategory = "thread" directory_fmt = ("{category}", "{board[shortname]}", "{thread_num} {title|comment[:50]}") pattern = BASE_PATTERN + r"/([^/?#]+)/thread/(\d+)" example = "https://archived.moe/a/thread/12345/" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.board = self.groups[-2] self.thread = self.groups[-1] self.data = None def metadata(self): url = self.root + "/_/api/chan/thread/" params = {"board": self.board, "num": self.thread} self.data = self.request_json(url, params=params)[self.thread] return self.data["op"] def posts(self): op = (self.data["op"],) if posts := self.data.get("posts"): posts = list(posts.values()) posts.sort(key=lambda p: p["timestamp"]) return itertools.chain(op, posts) return op class FoolfuukaBoardExtractor(FoolfuukaExtractor): """Base extractor for FoolFuuka based boards/archives""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/(?:page/)?(\d*))?$" example = "https://archived.moe/a/" def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.board = self.groups[-2] self.page = self.groups[-1] def items(self): index_base = f"{self.root}/_/api/chan/index/?board={self.board}&page=" thread_base = f"{self.root}/{self.board}/thread/" page = self.page for pnum in itertools.count(text.parse_int(page, 1)): with self.request(index_base + str(pnum)) as response: try: threads = response.json() except ValueError: threads = None if not threads: return for num, thread in threads.items(): thread["url"] = thread_base + format(num) thread["_extractor"] = FoolfuukaThreadExtractor yield Message.Queue, thread["url"], thread if page: return class FoolfuukaSearchExtractor(FoolfuukaExtractor): """Base extractor for search results on FoolFuuka based boards/archives""" subcategory = "search" directory_fmt = ("{category}", "search", "{search}") pattern = BASE_PATTERN + r"/([^/?#]+)/search((?:/[^/?#]+/[^/?#]+)+)" example = "https://archived.moe/_/search/text/QUERY/" request_interval = (0.5, 1.5) def __init__(self, match): FoolfuukaExtractor.__init__(self, match) self.params = params = {} key = None for arg in self.groups[-1].split("/"): if key: params[key] = text.unescape(arg) key = None else: key = arg board = self.groups[-2] if board != "_": params["boards"] = board def metadata(self): return {"search": self.params.get("text", "")} def posts(self): url = self.root + "/_/api/chan/search/" params = self.params.copy() params["page"] = text.parse_int(params.get("page"), 1) if "filter" not in params: params["filter"] = "text" while True: try: data = self.request_json(url, params=params) except ValueError: return if isinstance(data, dict): if data.get("error"): return posts = data["0"]["posts"] elif isinstance(data, list): posts = data[0]["posts"] else: return yield from posts if len(posts) <= 3: return params["page"] += 1 class FoolfuukaGalleryExtractor(FoolfuukaExtractor): """Base extractor for FoolFuuka galleries""" subcategory = "gallery" directory_fmt = ("{category}", "{board}", "gallery") pattern = BASE_PATTERN + r"/([^/?#]+)/gallery(?:/(\d+))?" example = "https://archived.moe/a/gallery" def metadata(self): self.board = board = self.groups[-2] return {"board": board} def posts(self): pnum = self.groups[-1] pages = itertools.count(1) if pnum is None else (pnum,) base = f"{self.root}/_/api/chan/gallery/?board={self.board}&page=" for pnum in pages: posts = self.request_json(f"{base}{pnum}") if not posts: return yield from posts ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/foolslide.py0000644000175000017500000001042415040344700021070 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for FoOlSlide based sites""" from .common import BaseExtractor, Message from .. import text, util class FoolslideExtractor(BaseExtractor): """Base class for FoOlSlide extractors""" basecategory = "foolslide" def __init__(self, match): BaseExtractor.__init__(self, match) self.page_url = self.root + self.groups[-1] def request(self, url): return BaseExtractor.request( self, url, encoding="utf-8", method="POST", data={"adult": "true"}) def parse_chapter_url(self, url, data): info = url.partition("/read/")[2].rstrip("/").split("/") lang = info[1].partition("-")[0] data["lang"] = lang data["language"] = util.code_to_language(lang) data["volume"] = text.parse_int(info[2]) data["chapter"] = text.parse_int(info[3]) data["chapter_minor"] = "." + info[4] if len(info) >= 5 else "" data["title"] = data["chapter_string"].partition(":")[2].strip() return data BASE_PATTERN = FoolslideExtractor.update({ }) class FoolslideChapterExtractor(FoolslideExtractor): """Base class for chapter extractors for FoOlSlide based sites""" subcategory = "chapter" directory_fmt = ("{category}", "{manga}", "{chapter_string}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor:?//}_{page:>03}.{extension}") archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/read/[^/?#]+/[a-z-]+/\d+/\d+(?:/\d+)?)" example = "https://read.powermanga.org/read/MANGA/en/0/123/" def items(self): page = self.request(self.page_url).text data = self.metadata(page) imgs = self.images(page) data["count"] = len(imgs) data["chapter_id"] = text.parse_int(imgs[0]["chapter_id"]) yield Message.Directory, data enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], image in enum(imgs, 1): try: url = image["url"] del image["url"] del image["chapter_id"] del image["thumb_url"] except KeyError: pass for key in ("height", "id", "size", "width"): image[key] = text.parse_int(image[key]) data.update(image) text.nameext_from_url(data["filename"], data) yield Message.Url, url, data def metadata(self, page): extr = text.extract_from(page) extr('

    ', '') return self.parse_chapter_url(self.page_url, { "manga" : text.unescape(extr('title="', '"')).strip(), "chapter_string": text.unescape(extr('title="', '"')), }) def images(self, page): return util.json_loads(text.extr(page, "var pages = ", ";")) class FoolslideMangaExtractor(FoolslideExtractor): """Base class for manga extractors for FoOlSlide based sites""" subcategory = "manga" categorytransfer = True pattern = BASE_PATTERN + r"(/series/[^/?#]+)" example = "https://read.powermanga.org/series/MANGA/" def items(self): page = self.request(self.page_url).text chapters = self.chapters(page) if not self.config("chapter-reverse", False): chapters.reverse() for chapter, data in chapters: data["_extractor"] = FoolslideChapterExtractor yield Message.Queue, chapter, data def chapters(self, page): extr = text.extract_from(page) manga = text.unescape(extr('

    ', '

    ')).strip() author = extr('Author: ', 'Artist: ', '
    ")) path = extr('href="//d', '"') if not path: msg = text.remove_html( extr('System Message', '') or extr('System Message', '
    ') ).partition(" . Continue ")[0] return self.log.warning( "Unable to download post %s (\"%s\")", post_id, msg) pi = text.parse_int rh = text.remove_html data = text.nameext_from_url(path, { "id" : pi(post_id), "url": "https://d" + path, }) if self._new_layout: data["tags"] = text.split_html(extr( 'class="tags-row">', '')) data["scraps"] = (extr(' submissions">', "<") == "Scraps") data["title"] = text.unescape(extr("

    ", "

    ")) data["artist_url"] = extr('title="', '"').strip() data["artist"] = extr(">", "<") data["_description"] = extr( 'class="submission-description user-submitted-links">', ' ') data["views"] = pi(rh(extr('class="views">', ''))) data["favorites"] = pi(rh(extr('class="favorites">', ''))) data["comments"] = pi(rh(extr('class="comments">', ''))) data["rating"] = rh(extr('class="rating">', '')) data["fa_category"] = rh(extr('>Category', '')) data["theme"] = rh(extr('>', '<')) data["species"] = rh(extr('>Species', '')) data["gender"] = rh(extr('>Gender', '')) data["width"] = pi(extr("", "x")) data["height"] = pi(extr("", "p")) data["folders"] = folders = [] for folder in extr( "

    Listed in Folders

    ", "").split(""): if folder := rh(folder): folders.append(folder) else: # old site layout data["scraps"] = ( "/scraps/" in extr('class="minigallery-title', "")) data["title"] = text.unescape(extr("

    ", "

    ")) data["artist_url"] = extr('title="', '"').strip() data["artist"] = extr(">", "<") data["fa_category"] = extr("Category:", "<").strip() data["theme"] = extr("Theme:", "<").strip() data["species"] = extr("Species:", "<").strip() data["gender"] = extr("Gender:", "<").strip() data["favorites"] = pi(extr("Favorites:", "<")) data["comments"] = pi(extr("Comments:", "<")) data["views"] = pi(extr("Views:", "<")) data["width"] = pi(extr("Resolution:", "x")) data["height"] = pi(extr("", "<")) data["tags"] = text.split_html(extr( 'id="keywords">', ''))[::2] data["rating"] = extr('', ' ')
            data[', ' ') data["folders"] = () # folders not present in old layout data["user"] = self.user or data["artist_url"] data["date"] = text.parse_timestamp(data["filename"].partition(".")[0]) data["description"] = self._process_description(data["_description"]) data["thumbnail"] = (f"https://t.furaffinity.net/{post_id}@600-" f"{path.rsplit('/', 2)[1]}.jpg") return data def _process_description(self, description): return text.unescape(text.remove_html(description, "", "")) def _pagination(self, path, folder=None): num = 1 folder = "" if folder is None else f"/folder/{folder}/a" while True: url = f"{self.root}/{path}/{self.user}{folder}/{num}/" page = self.request(url).text post_id = None for post_id in text.extract_iter(page, 'id="sid-', '"'): yield post_id if not post_id: return num += 1 def _pagination_favorites(self): path = f"/favorites/{self.user}/" while path: page = self.request(self.root + path).text extr = text.extract_from(page) while True: post_id = extr('id="sid-', '"') if not post_id: break self._favorite_id = text.parse_int(extr('data-fav-id="', '"')) yield post_id pos = page.find('type="submit">Next') if pos >= 0: path = text.rextr(page, '
    Next 48")) < 0 and \ (pos := page.find(">>>> Next 48 >>")) < 0: return path = text.rextr(page, 'href="', '"', pos) url = self.root + text.unescape(path) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/furry34.py0000644000175000017500000001077415040344700020436 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://furry34.com/""" from .booru import BooruExtractor from .. import text import collections BASE_PATTERN = r"(?:https?://)?(?:www\.)?furry34\.com" class Furry34Extractor(BooruExtractor): category = "furry34" root = "https://furry34.com" root_cdn = "https://furry34com.b-cdn.net" filename_fmt = "{category}_{id}.{extension}" per_page = 30 TAG_TYPES = { None: "general", 1 : "general", 2 : "copyright", 4 : "character", 8 : "artist", } FORMATS = ( ("100", "mov.mp4"), ("101", "mov720.mp4"), ("102", "mov480.mp4"), ("10" , "pic.jpg"), ) def _file_url(self, post): files = post["files"] for fmt, extension in self.FORMATS: if fmt in files: break else: fmt = next(iter(files)) post_id = post["id"] root = self.root_cdn if files[fmt][0] else self.root post["file_url"] = url = \ f"{root}/posts/{post_id // 1000}/{post_id}/{post_id}.{extension}" post["format_id"] = fmt post["format"] = extension.partition(".")[0] return url def _prepare(self, post): post.pop("files", None) post["date"] = text.parse_datetime( post["created"], "%Y-%m-%dT%H:%M:%S.%fZ") post["filename"], _, post["format"] = post["filename"].rpartition(".") if "tags" in post: post["tags"] = [t["value"] for t in post["tags"]] def _tags(self, post, _): if "tags" not in post: post.update(self._fetch_post(post["id"])) tags = collections.defaultdict(list) for tag in post["tags"]: tags[tag["type"] or 1].append(tag["value"]) types = self.TAG_TYPES for type, values in tags.items(): post["tags_" + types[type]] = values def _fetch_post(self, post_id): url = f"{self.root}/api/v2/post/{post_id}" return self.request_json(url) def _pagination(self, endpoint, params=None): url = f"{self.root}/api{endpoint}" if params is None: params = {} params["sortBy"] = 0 params["take"] = self.per_page threshold = self.per_page while True: data = self.request_json(url, method="POST", json=params) yield from data["items"] if len(data["items"]) < threshold: return params["cursor"] = data.get("cursor") class Furry34PostExtractor(Furry34Extractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post/(\d+)" example = "https://furry34.com/post/12345" def posts(self): return (self._fetch_post(self.groups[0]),) class Furry34PlaylistExtractor(Furry34Extractor): subcategory = "playlist" directory_fmt = ("{category}", "{playlist_id}") archive_fmt = "p_{playlist_id}_{id}" pattern = BASE_PATTERN + r"/playlists/view/(\d+)" example = "https://furry34.com/playlists/view/12345" def metadata(self): return {"playlist_id": self.groups[0]} def posts(self): endpoint = "/v2/post/search/playlist/" + self.groups[0] return self._pagination(endpoint) class Furry34TagExtractor(Furry34Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/(?:([^/?#]+))?(?:/?\?([^#]+))?(?:$|#)" example = "https://furry34.com/TAG" def _init(self): tag, query = self.groups params = text.parse_query(query) self.tags = tags = [] if tag: tags.extend(text.unquote(text.unquote(tag)).split("|")) if "tags" in params: tags.extend(params["tags"].split("|")) type = params.get("type") if type == "video": self.type = 1 elif type == "image": self.type = 0 else: self.type = None def metadata(self): return {"search_tags": " ".join(self.tags)} def posts(self): endpoint = "/v2/post/search/root" params = {"includeTags": [t.replace("_", " ") for t in self.tags]} if self.type is not None: params["type"] = self.type return self._pagination(endpoint, params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/fuskator.py0000644000175000017500000000557515040344700020761 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fuskator.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text import time class FuskatorGalleryExtractor(GalleryExtractor): """Extractor for image galleries on fuskator.com""" category = "fuskator" root = "https://fuskator.com" pattern = r"(?:https?://)?fuskator\.com/(?:thumbs|expanded)/([^/?#]+)" example = "https://fuskator.com/thumbs/ID/" def __init__(self, match): self.gallery_hash = match[1] url = f"{self.root}/thumbs/{self.gallery_hash}/index.html" GalleryExtractor.__init__(self, match, url) def metadata(self, page): headers = { "Referer" : self.page_url, "X-Requested-With": "XMLHttpRequest", } auth = self.request( self.root + "/ajax/auth.aspx", method="POST", headers=headers, ).text params = { "X-Auth": auth, "hash" : self.gallery_hash, "_" : int(time.time()), } self.data = data = self.request_json( self.root + "/ajax/gal.aspx", params=params, headers=headers) title = text.extr(page, "", "").strip() title, _, gallery_id = title.rpartition("#") return { "gallery_id" : text.parse_int(gallery_id), "gallery_hash": self.gallery_hash, "title" : text.unescape(title[:-15]), "views" : data.get("hits"), "score" : data.get("rating"), "tags" : (data.get("tags") or "").split(","), } def images(self, page): return [ ("https:" + image["imageUrl"], image) for image in self.data["images"] ] class FuskatorSearchExtractor(Extractor): """Extractor for search results on fuskator.com""" category = "fuskator" subcategory = "search" root = "https://fuskator.com" pattern = r"(?:https?://)?fuskator\.com(/(?:search|page)/.+)" example = "https://fuskator.com/search/TAG/" def __init__(self, match): Extractor.__init__(self, match) self.path = match[1] def items(self): url = self.root + self.path data = {"_extractor": FuskatorGalleryExtractor} while True: page = self.request(url).text for path in text.extract_iter( page, 'class="pic_pad">', '>>><') if not pages: return url = self.root + text.rextr(pages, 'href="', '"') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/gelbooru.py0000644000175000017500000002300415040344700020724 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://gelbooru.com/""" from .common import Extractor, Message from . import gelbooru_v02 from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?gelbooru\.com/(?:index\.php)?\?" class GelbooruBase(): """Base class for gelbooru extractors""" category = "gelbooru" basecategory = "booru" root = "https://gelbooru.com" offset = 0 def _api_request(self, params, key="post", log=False): if "s" not in params: params["s"] = "post" params["api_key"] = self.api_key params["user_id"] = self.user_id url = self.root + "/index.php?page=dapi&q=index&json=1" try: data = self.request_json(url, params=params) except exception.HttpError as exc: if exc.status == 401: raise exception.AuthorizationError( f"'api-key' and 'user-id' required " f"({exc.status}: {exc.response.reason})") raise if not key: return data try: posts = data[key] except KeyError: if log: self.log.error("Incomplete API response (missing '%s')", key) self.log.debug("%s", data) return [] if not isinstance(posts, list): return (posts,) return posts def _pagination(self, params): params["pid"] = self.page_start params["limit"] = self.per_page limit = self.per_page // 2 pid = False if "tags" in params: tags = params["tags"].split() op = "<" id = False for tag in tags: if tag.startswith("sort:"): if tag == "sort:id:asc": op = ">" elif tag == "sort:id" or tag.startswith("sort:id:"): op = "<" else: pid = True elif tag.startswith("id:"): id = True if not pid: if id: tag = "id:" + op tags = [t for t in tags if not t.startswith(tag)] tags = f"{' '.join(tags)} id:{op}" while True: posts = self._api_request(params) yield from posts if len(posts) < limit: return if pid: params["pid"] += 1 else: if "pid" in params: del params["pid"] params["tags"] = tags + str(posts[-1]["id"]) def _pagination_html(self, params): url = self.root + "/index.php" params["pid"] = self.offset data = {} while True: num_ids = 0 page = self.request(url, params=params).text for data["id"] in text.extract_iter(page, '" id="p', '"'): num_ids += 1 yield from self._api_request(data) if num_ids < self.per_page: return params["pid"] += self.per_page def _file_url(self, post): url = post["file_url"] if url.endswith((".webm", ".mp4")): post["_fallback"] = (url,) md5 = post["md5"] root = text.root_from_url(post["preview_url"]) path = f"/images/{md5[0:2]}/{md5[2:4]}/{md5}.webm" url = root + path return url def _notes(self, post, page): notes_data = text.extr(page, '
    ') if not notes_data: return post["notes"] = notes = [] extr = text.extract for note in text.extract_iter(notes_data, ''): notes.append({ "width" : int(extr(note, 'data-width="', '"')[0]), "height": int(extr(note, 'data-height="', '"')[0]), "x" : int(extr(note, 'data-x="', '"')[0]), "y" : int(extr(note, 'data-y="', '"')[0]), "body" : extr(note, 'data-body="', '"')[0], }) def _skip_offset(self, num): self.offset += num return num class GelbooruTagExtractor(GelbooruBase, gelbooru_v02.GelbooruV02TagExtractor): """Extractor for images from gelbooru.com based on search-tags""" pattern = BASE_PATTERN + r"page=post&s=list&tags=([^&#]*)" example = "https://gelbooru.com/index.php?page=post&s=list&tags=TAG" class GelbooruPoolExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PoolExtractor): """Extractor for gelbooru pools""" per_page = 45 pattern = BASE_PATTERN + r"page=pool&s=show&id=(\d+)" example = "https://gelbooru.com/index.php?page=pool&s=show&id=12345" skip = GelbooruBase._skip_offset def metadata(self): url = self.root + "/index.php" self._params = { "page": "pool", "s" : "show", "id" : self.pool_id, } page = self.request(url, params=self._params).text name, pos = text.extract(page, "

    Now Viewing: ", "

    ") if not name: raise exception.NotFoundError("pool") return { "pool": text.parse_int(self.pool_id), "pool_name": text.unescape(name), } def posts(self): return self._pagination_html(self._params) class GelbooruFavoriteExtractor(GelbooruBase, gelbooru_v02.GelbooruV02FavoriteExtractor): """Extractor for gelbooru favorites""" per_page = 100 pattern = BASE_PATTERN + r"page=favorites&s=view&id=(\d+)" example = "https://gelbooru.com/index.php?page=favorites&s=view&id=12345" skip = GelbooruBase._skip_offset def posts(self): # get number of favorites params = { "s" : "favorite", "id" : self.favorite_id, "limit": "2", } data = self._api_request(params, None, True) count = data["@attributes"]["count"] self.log.debug("API reports %s favorite entries", count) favs = data["favorite"] try: order = 1 if favs[0]["id"] < favs[1]["id"] else -1 except LookupError as exc: self.log.debug( "Error when determining API favorite order (%s: %s)", exc.__class__.__name__, exc) order = -1 else: self.log.debug("API yields favorites in %sscending order", "a" if order > 0 else "de") order_favs = self.config("order-posts") if order_favs and order_favs[0] in ("r", "a"): self.log.debug("Returning them in reverse") order = -order if order < 0: return self._pagination(params, count) return self._pagination_reverse(params, count) def _pagination(self, params, count): if self.offset: pnum, skip = divmod(self.offset, self.per_page) else: pnum = skip = 0 params["pid"] = pnum params["limit"] = self.per_page while True: favs = self._api_request(params, "favorite") if not favs: return if skip: favs = favs[skip:] skip = 0 for fav in favs: for post in self._api_request({"id": fav["favorite"]}): post["date_favorited"] = text.parse_timestamp(fav["added"]) yield post params["pid"] += 1 def _pagination_reverse(self, params, count): pnum, last = divmod(count-1, self.per_page) if self.offset > last: # page number change self.offset -= last diff, self.offset = divmod(self.offset-1, self.per_page) pnum -= diff + 1 skip = self.offset params["pid"] = pnum params["limit"] = self.per_page while True: favs = self._api_request(params, "favorite") favs.reverse() if skip: favs = favs[skip:] skip = 0 for fav in favs: for post in self._api_request({"id": fav["favorite"]}): post["date_favorited"] = text.parse_timestamp(fav["added"]) yield post params["pid"] -= 1 if params["pid"] < 0: return class GelbooruPostExtractor(GelbooruBase, gelbooru_v02.GelbooruV02PostExtractor): """Extractor for single images from gelbooru.com""" pattern = (BASE_PATTERN + r"(?=(?:[^#]+&)?page=post(?:&|#|$))" r"(?=(?:[^#]+&)?s=view(?:&|#|$))" r"(?:[^#]+&)?id=(\d+)") example = "https://gelbooru.com/index.php?page=post&s=view&id=12345" class GelbooruRedirectExtractor(GelbooruBase, Extractor): subcategory = "redirect" pattern = (r"(?:https?://)?(?:www\.)?gelbooru\.com" r"/redirect\.php\?s=([^&#]+)") example = "https://gelbooru.com/redirect.php?s=BASE64" def __init__(self, match): Extractor.__init__(self, match) self.url_base64 = match[1] def items(self): url = text.ensure_http_scheme(binascii.a2b_base64( self.url_base64).decode()) data = {"_extractor": GelbooruPostExtractor} yield Message.Queue, url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/gelbooru_v01.py0000644000175000017500000001012715040344700021414 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Gelbooru Beta 0.1.11 sites""" from . import booru from .. import text class GelbooruV01Extractor(booru.BooruExtractor): basecategory = "gelbooru_v01" per_page = 20 def _parse_post(self, post_id): url = f"{self.root}/index.php?page=post&s=view&id={post_id}" extr = text.extract_from(self.request(url).text) post = { "id" : post_id, "created_at": extr('Posted: ', ' <'), "uploader" : extr('By: ', ' <'), "width" : extr('Size: ', 'x'), "height" : extr('', ' <'), "source" : extr('Source: ', ' <'), "rating" : (extr('Rating: ', '<') or "?")[0].lower(), "score" : extr('Score: ', ' <'), "file_url" : extr('img', '<')), } post["md5"] = post["file_url"].rpartition("/")[2].partition(".")[0] post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") return post def skip(self, num): self.page_start += num return num def _pagination(self, url, begin, end): pid = self.page_start while True: page = self.request(url + str(pid)).text cnt = 0 for post_id in text.extract_iter(page, begin, end): yield self._parse_post(post_id) cnt += 1 if cnt < self.per_page: return pid += self.per_page BASE_PATTERN = GelbooruV01Extractor.update({ "thecollection": { "root": "https://the-collection.booru.org", "pattern": r"the-collection\.booru\.org", }, "illusioncardsbooru": { "root": "https://illusioncards.booru.org", "pattern": r"illusioncards\.booru\.org", }, "allgirlbooru": { "root": "https://allgirl.booru.org", "pattern": r"allgirl\.booru\.org", }, "drawfriends": { "root": "https://drawfriends.booru.org", "pattern": r"drawfriends\.booru\.org", }, "vidyart2": { "root": "https://vidyart2.booru.org", "pattern": r"vidyart2\.booru\.org", }, }) class GelbooruV01TagExtractor(GelbooruV01Extractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/index\.php\?page=post&s=list&tags=([^&#]+)" example = "https://allgirl.booru.org/index.php?page=post&s=list&tags=TAG" def metadata(self): self.tags = tags = self.groups[-1] return {"search_tags": text.unquote(tags.replace("+", " "))} def posts(self): url = f"{self.root}/index.php?page=post&s=list&tags={self.tags}&pid=" return self._pagination(url, 'class="thumb">
    = total: return if not num: self.log.debug("Empty response - Retrying") continue params["pid"] += 1 def _pagination_html(self, params): url = self.root + "/index.php" params["pid"] = self.page_start * self.per_page data = {} find_ids = util.re(r"\sid=\"p(\d+)").findall while True: page = self.request(url, params=params).text pids = find_ids(page) for data["id"] in pids: for post in self._api_request(data): yield post.attrib if len(pids) < self.per_page: return params["pid"] += self.per_page def _file_url_rule34(self, post): url = post["file_url"] if text.ext_from_url(url) not in util.EXTS_VIDEO: path = url.partition(".")[2] post["_fallback"] = (url,) post["file_url"] = url = "https://wimg." + path return url def _prepare(self, post): post["tags"] = post["tags"].strip() post["date"] = text.parse_datetime( post["created_at"], "%a %b %d %H:%M:%S %z %Y") def _html(self, post): url = f"{self.root}/index.php?page=post&s=view&id={post['id']}" return self.request(url).text def _tags(self, post, page): tag_container = (text.extr(page, '
    ') .replace("\r\n", "\n"), "", "")), "ratings" : [text.unescape(r) for r in text.extract_iter(extr( "class='ratings_box'", ""), "title='", "'")], "date" : text.parse_datetime(extr("datetime='", "'")), "views" : text.parse_int(extr(">Views", "<")), "score" : text.parse_int(extr(">Vote Score", "<")), "media" : text.unescape(extr(">Media", "<").strip()), "tags" : text.split_html(extr( ">Tags ", "")), } body = data["_body"] if "", "").rpartition(">")[2]), "author" : text.unescape(extr('alt="', '"')), "date" : text.parse_datetime(extr( ">Updated<", "").rpartition(">")[2], "%B %d, %Y"), "status" : extr("class='indent'>", "<"), } for c in ("Chapters", "Words", "Comments", "Views", "Rating"): data[c.lower()] = text.parse_int(extr( ">" + c + ":", "<").replace(",", "")) data["description"] = text.unescape(extr( "class='storyDescript'>", ""), "title='", "'")] return text.nameext_from_url(data["src"], data) def _request_check(self, url, **kwargs): self.request = self._request_original # check for Enter button / front page # and update PHPSESSID and content filters if necessary response = self.request(url, **kwargs) content = response.content if len(content) < 5000 and \ b'
    ', '') class HentaifoundryStoryExtractor(HentaifoundryExtractor): """Extractor for a hentaifoundry story""" subcategory = "story" archive_fmt = "s_{index}" pattern = BASE_PATTERN + r"/stories/user/([^/?#]+)/(\d+)" example = "https://www.hentai-foundry.com/stories/user/USER/12345/TITLE" skip = Extractor.skip def __init__(self, match): HentaifoundryExtractor.__init__(self, match) self.index = match[3] def items(self): story_url = (f"{self.root}/stories/user/{self.user}" f"/{self.index}/x?enterAgree=1") story = self._parse_story(self.request(story_url).text) yield Message.Directory, story yield Message.Url, story["src"], story ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/hentaihand.py0000644000175000017500000000635515040344700021223 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihand.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class HentaihandGalleryExtractor(GalleryExtractor): """Extractor for image galleries on hentaihand.com""" category = "hentaihand" root = "https://hentaihand.com" pattern = r"(?:https?://)?(?:www\.)?hentaihand\.com/\w+/comic/([\w-]+)" example = "https://hentaihand.com/en/comic/TITLE" def __init__(self, match): self.slug = match[1] url = f"{self.root}/api/comics/{self.slug}" GalleryExtractor.__init__(self, match, url) def metadata(self, page): info = util.json_loads(page) data = { "gallery_id" : text.parse_int(info["id"]), "title" : info["title"], "title_alt" : info["alternative_title"], "slug" : self.slug, "type" : info["category"]["name"], "language" : info["language"]["name"], "lang" : util.language_to_code(info["language"]["name"]), "tags" : [t["slug"] for t in info["tags"]], "date" : text.parse_datetime( info["uploaded_at"], "%Y-%m-%d"), } for key in ("artists", "authors", "groups", "characters", "relationships", "parodies"): data[key] = [v["name"] for v in info[key]] return data def images(self, _): info = self.request_json(self.page_url + "/images") return [(img["source_url"], img) for img in info["images"]] class HentaihandTagExtractor(Extractor): """Extractor for tag searches on hentaihand.com""" category = "hentaihand" subcategory = "tag" root = "https://hentaihand.com" pattern = (r"(?i)(?:https?://)?(?:www\.)?hentaihand\.com" r"/\w+/(parody|character|tag|artist|group|language" r"|category|relationship)/([^/?#]+)") example = "https://hentaihand.com/en/tag/TAG" def __init__(self, match): Extractor.__init__(self, match) self.type, self.key = match.groups() def items(self): if self.type[-1] == "y": tpl = self.type[:-1] + "ies" else: tpl = self.type + "s" url = f"{self.root}/api/{tpl}/{self.key}" tid = self.request_json(url, notfound=self.type)["id"] url = self.root + "/api/comics" params = { "per_page": "18", tpl : tid, "page" : 1, "q" : "", "sort" : "uploaded_at", "order" : "desc", "duration": "day", } while True: info = self.request_json(url, params=params) for gallery in info["data"]: gurl = f"{self.root}/en/comic/{gallery['slug']}" gallery["_extractor"] = HentaihandGalleryExtractor yield Message.Queue, gurl, gallery if params["page"] >= info["last_page"]: return params["page"] += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/hentaihere.py0000644000175000017500000000706215040344700021230 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentaihere.com/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util class HentaihereBase(): """Base class for hentaihere extractors""" category = "hentaihere" root = "https://hentaihere.com" class HentaihereChapterExtractor(HentaihereBase, ChapterExtractor): """Extractor for a single manga chapter from hentaihere.com""" archive_fmt = "{chapter_id}_{page}" pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com/m/S(\d+)/([^/?#]+)" example = "https://hentaihere.com/m/S12345/1/1/" def __init__(self, match): self.manga_id, self.chapter = match.groups() url = f"{self.root}/m/S{self.manga_id}/{self.chapter}/1" ChapterExtractor.__init__(self, match, url) def metadata(self, page): title = text.extr(page, "", "") chapter_id = text.extr(page, 'report/C', '"') chapter, sep, minor = self.chapter.partition(".") match = util.re( r"Page 1 \| (.+) \(([^)]+)\) - Chapter \d+: (.+) by " r"(.+) at ").match(title) return { "manga": match[1], "manga_id": text.parse_int(self.manga_id), "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": match[2], "title": match[3], "author": match[4], "lang": "en", "language": "English", } def images(self, page): images = text.extr(page, "var rff_imageList = ", ";") return [ ("https://hentaicdn.com/hentai" + part, None) for part in util.json_loads(images) ] class HentaihereMangaExtractor(HentaihereBase, MangaExtractor): """Extractor for hmanga from hentaihere.com""" chapterclass = HentaihereChapterExtractor pattern = r"(?:https?://)?(?:www\.)?hentaihere\.com(/m/S\d+)/?$" example = "https://hentaihere.com/m/S12345" def chapters(self, page): results = [] pos = page.find('itemscope itemtype="http://schema.org/Book') + 1 manga, pos = text.extract( page, '', '', pos) mtype, pos = text.extract( page, '[', ']', pos) manga_id = text.parse_int( self.page_url.rstrip("/").rpartition("/")[2][1:]) while True: marker, pos = text.extract( page, '
  • ', '', pos) if marker is None: return results url, pos = text.extract(page, '\n', '<', pos) chapter_id, pos = text.extract(page, '/C', '"', pos) chapter, _, title = text.unescape(chapter).strip().partition(" - ") chapter, sep, minor = chapter.partition(".") results.append((url, { "manga_id": manga_id, "manga": manga, "chapter": text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_id": text.parse_int(chapter_id), "type": mtype, "title": title, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/hentainexus.py0000644000175000017500000001310315040344700021440 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hentainexus.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util import binascii class HentainexusGalleryExtractor(GalleryExtractor): """Extractor for hentainexus galleries""" category = "hentainexus" root = "https://hentainexus.com" pattern = (r"(?i)(?:https?://)?(?:www\.)?hentainexus\.com" r"/(?:view|read)/(\d+)") example = "https://hentainexus.com/view/12345" def __init__(self, match): self.gallery_id = match[1] url = f"{self.root}/view/{self.gallery_id}" GalleryExtractor.__init__(self, match, url) def metadata(self, page): rmve = text.remove_html extr = text.extract_from(page) data = { "gallery_id": text.parse_int(self.gallery_id), "cover" : extr('"og:image" content="', '"'), "title" : extr('

    ', '

    '), } for key in ("Artist", "Book", "Circle", "Event", "Language", "Magazine", "Parody", "Publisher", "Description"): value = rmve(extr('viewcolumn">' + key + '', '')) value, sep, rest = value.rpartition(" (") data[key.lower()] = value if sep else rest data["tags"] = tags = [] for k in text.extract_iter(page, '
    > 1 ^ 0xc else: C = C >> 1 k = primes[C & 0x7] x = 0 S = list(range(256)) for i in range(256): x = (x + S[i] + key[i % len(key)]) % 256 S[i], S[x] = S[x], S[i] result = "" a = c = m = x = 0 for n in range(64, len(blob)): a = (a + k) % 256 x = (c + S[(x + S[a]) % 256]) % 256 c = (c + a + S[a]) % 256 S[a], S[x] = S[x], S[a] m = S[(x + S[(a + S[(m + c) % 256]) % 256]) % 256] result += chr(blob[n] ^ m) return result def _join_title(self, data): event = data['event'] artist = data['artist'] circle = data['circle'] title = data['title'] parody = data['parody'] book = data['book'] magazine = data['magazine'] # a few galleries have a large number of artists or parodies, # which get replaced with "Various" in the title string if artist.count(',') >= 3: artist = 'Various' if parody.count(',') >= 3: parody = 'Various' jt = '' if event: jt += f'({event}) ' if circle: jt += f'[{circle} ({artist})] ' else: jt += f'[{artist}] ' jt += title if parody.lower() != 'original work': jt += f' ({parody})' if book: jt += f' ({book})' if magazine: jt += f' ({magazine})' return jt class HentainexusSearchExtractor(Extractor): """Extractor for hentainexus search results""" category = "hentainexus" subcategory = "search" root = "https://hentainexus.com" pattern = (r"(?i)(?:https?://)?(?:www\.)?hentainexus\.com" r"(?:/page/\d+)?/?(?:\?(q=[^/?#]+))?$") example = "https://hentainexus.com/?q=QUERY" def items(self): params = text.parse_query(self.groups[0]) data = {"_extractor": HentainexusGalleryExtractor} path = "/" while path: page = self.request(self.root + path, params=params).text extr = text.extract_from(page) while True: gallery_id = extr('', '<')), "author" : text.remove_html(extr( 'class="author-content">', '
  • ')), "artist" : text.remove_html(extr( 'class="artist-content">', '')), "genre" : text.split_html(extr( 'class="genres-content">', ''))[::2], "type" : extr( 'class="summary-content">', '<').strip(), "release": text.parse_int(text.remove_html(extr( 'class="summary-content">', ''))), "status" : extr( 'class="summary-content">', '<').strip(), "description": text.remove_html(text.unescape(extr( '
    ', "
    "))), "language": "English", "lang" : "en", } def chapter_data(self, chapter): if chapter.startswith("chapter-"): chapter = chapter[8:] chapter, _, minor = chapter.partition("-") data = { "chapter" : text.parse_int(chapter), "chapter_minor": "." + minor if minor and minor != "end" else "", } data.update(self.manga_data(self.manga.lower())) return data class HiperdexChapterExtractor(HiperdexBase, ChapterExtractor): """Extractor for hiperdex manga chapters""" pattern = BASE_PATTERN + r"(/mangas?/([^/?#]+)/([^/?#]+))" example = "https://hiperdex.com/manga/MANGA/CHAPTER/" def __init__(self, match): root, path, self.manga, self.chapter = match.groups() self.root = text.ensure_http_scheme(root) ChapterExtractor.__init__(self, match, self.root + path + "/") def metadata(self, _): return self.chapter_data(self.chapter) def images(self, page): pattern = util.re(r'id="image-\d+"\s+(?:data-)?src="([^"]+)') return [ (url.strip(), None) for url in pattern.findall(page) ] class HiperdexMangaExtractor(HiperdexBase, MangaExtractor): """Extractor for hiperdex manga""" chapterclass = HiperdexChapterExtractor pattern = BASE_PATTERN + r"(/mangas?/([^/?#]+))/?$" example = "https://hiperdex.com/manga/MANGA/" def __init__(self, match): root, path, self.manga = match.groups() self.root = text.ensure_http_scheme(root) MangaExtractor.__init__(self, match, self.root + path + "/") def chapters(self, page): data = self.manga_data(self.manga, page) self.page_url = url = data["url"] url = self.page_url + "ajax/chapters/" headers = { "Accept": "*/*", "X-Requested-With": "XMLHttpRequest", "Origin": self.root, "Referer": "https://" + text.quote(self.page_url[8:]), } html = self.request(url, method="POST", headers=headers).text results = [] for item in text.extract_iter( html, '
  • = total: return class HitomiIndexExtractor(HitomiTagExtractor): """Extractor for galleries from index searches on hitomi.la""" subcategory = "index" pattern = r"(?:https?://)?hitomi\.la/(\w+)-(\w+)\.html" example = "https://hitomi.la/index-LANG.html" def __init__(self, match): Extractor.__init__(self, match) self.tag, self.language = match.groups() def items(self): data = {"_extractor": HitomiGalleryExtractor} nozomi_url = (f"https://ltn.{self.domain}" f"/{self.tag}-{self.language}.nozomi") headers = { "Origin": self.root, "Cache-Control": "max-age=0", } offset = 0 total = None while True: headers["Referer"] = (f"{self.root}/{self.tag}-{self.language}" f".html?page={offset // 100 + 1}") headers["Range"] = f"bytes={offset}-{offset + 99}" response = self.request(nozomi_url, headers=headers) for gallery_id in decode_nozomi(response.content): gallery_url = f"{self.root}/galleries/{gallery_id}.html" yield Message.Queue, gallery_url, data offset += 100 if total is None: total = text.parse_int( response.headers["content-range"].rpartition("/")[2]) if offset >= total: return class HitomiSearchExtractor(HitomiExtractor): """Extractor for galleries from multiple tag searches on hitomi.la""" subcategory = "search" pattern = r"(?:https?://)?hitomi\.la/search\.html\?([^#]+)" example = "https://hitomi.la/search.html?QUERY" def items(self): tags = text.unquote(self.groups[0]) data = { "_extractor": HitomiGalleryExtractor, "search_tags": tags, } for gallery_id in self.gallery_ids(tags): gallery_url = f"{self.root}/galleries/{gallery_id}.html" yield Message.Queue, gallery_url, data def gallery_ids(self, tags): result = None positive = [] negative = [] for tag in tags.split(): if tag[0] == "-": negative.append(tag[1:]) else: positive.append(tag) for tag in positive: ids = self.load_nozomi(tag) if result is None: result = set(ids) else: result.intersection_update(ids) if result is None: # result = set(self.load_nozomi("index")) result = set(self.load_nozomi("language:all")) for tag in negative: result.difference_update(self.load_nozomi(tag)) return sorted(result, reverse=True) if result else () @memcache(maxage=1800) def _parse_gg(extr): page = extr.request("https://ltn.gold-usergeneratedcontent.net/gg.js").text m = {} keys = [] for match in util.re_compile( r"case\s+(\d+):(?:\s*o\s*=\s*(\d+))?").finditer(page): key, value = match.groups() keys.append(int(key)) if value: value = int(value) for key in keys: m[key] = value keys.clear() for match in util.re_compile( r"if\s+\(g\s*===?\s*(\d+)\)[\s{]*o\s*=\s*(\d+)").finditer(page): m[int(match[1])] = int(match[2]) d = util.re_compile(r"(?:var\s|default:)\s*o\s*=\s*(\d+)").search(page) b = util.re_compile(r"b:\s*[\"'](.+)[\"']").search(page) return m, b[1].strip("/"), int(d[1]) if d else 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/hotleak.py0000644000175000017500000001410715040344700020541 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://hotleak.vip/""" from .common import Extractor, Message from .. import text, exception import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?hotleak\.vip" class HotleakExtractor(Extractor): """Base class for hotleak extractors""" category = "hotleak" directory_fmt = ("{category}", "{creator}",) filename_fmt = "{creator}_{id}.{extension}" archive_fmt = "{type}_{creator}_{id}" root = "https://hotleak.vip" def items(self): for post in self.posts(): if not post["url"].startswith("ytdl:"): post["url"] = ( post["url"] .replace("/storage/storage/", "/storage/") .replace("_thumb.", ".") ) post["_http_expected_status"] = (404,) yield Message.Directory, post yield Message.Url, post["url"], post def posts(self): """Return an iterable containing relevant posts""" return () def _pagination(self, url, params): params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) while True: page = self.request(url, params=params).text if "" not in page: return for item in text.extract_iter( page, '
    ', '
    ') data = { "id" : text.parse_int(self.id), "creator": self.creator, "type" : self.type, } if self.type == "photo": data["url"] = text.extr(page, 'data-src="', '"') text.nameext_from_url(data["url"], data) elif self.type == "video": data["url"] = "ytdl:" + decode_video_url(text.extr( text.unescape(page), '"src":"', '"')) text.nameext_from_url(data["url"], data) data["extension"] = "mp4" return (data,) class HotleakCreatorExtractor(HotleakExtractor): """Extractor for all posts from a hotleak creator""" subcategory = "creator" pattern = (BASE_PATTERN + r"/(?!(?:hot|creators|videos|photos)(?:$|/))" r"([^/?#]+)/?$") example = "https://hotleak.vip/MODEL" def __init__(self, match): HotleakExtractor.__init__(self, match) self.creator = match[1] def posts(self): url = f"{self.root}/{self.creator}" return self._pagination(url) def _pagination(self, url): headers = {"X-Requested-With": "XMLHttpRequest"} params = {"page": 1} while True: try: response = self.request( url, headers=headers, params=params, notfound="creator") except exception.HttpError as exc: if exc.response.status_code == 429: self.wait( until=exc.response.headers.get("X-RateLimit-Reset")) continue raise posts = response.json() if not posts: return data = {"creator": self.creator} for post in posts: data["id"] = text.parse_int(post["id"]) if post["type"] == 0: data["type"] = "photo" data["url"] = self.root + "/storage/" + post["image"] text.nameext_from_url(data["url"], data) elif post["type"] == 1: data["type"] = "video" data["url"] = "ytdl:" + decode_video_url( post["stream_url_play"]) text.nameext_from_url(data["url"], data) data["extension"] = "mp4" yield data params["page"] += 1 class HotleakCategoryExtractor(HotleakExtractor): """Extractor for hotleak categories""" subcategory = "category" pattern = BASE_PATTERN + r"/(hot|creators|videos|photos)(?:/?\?([^#]+))?" example = "https://hotleak.vip/photos" def __init__(self, match): HotleakExtractor.__init__(self, match) self._category, self.params = match.groups() def items(self): url = f"{self.root}/{self._category}" if self._category in ("hot", "creators"): data = {"_extractor": HotleakCreatorExtractor} elif self._category in ("videos", "photos"): data = {"_extractor": HotleakPostExtractor} for item in self._pagination(url, self.params): yield Message.Queue, item, data class HotleakSearchExtractor(HotleakExtractor): """Extractor for hotleak search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/?\?([^#]+))" example = "https://hotleak.vip/search?search=QUERY" def __init__(self, match): HotleakExtractor.__init__(self, match) self.params = match[1] def items(self): data = {"_extractor": HotleakCreatorExtractor} for creator in self._pagination(self.root + "/search", self.params): yield Message.Queue, creator, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/idolcomplex.py0000644000175000017500000002214015040344700021425 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://idol.sankakucomplex.com/""" from .sankaku import SankakuExtractor from .common import Message from ..cache import cache from .. import text, util, exception import collections import re BASE_PATTERN = r"(?:https?://)?idol\.sankakucomplex\.com(?:/[a-z]{2})?" class IdolcomplexExtractor(SankakuExtractor): """Base class for idolcomplex extractors""" category = "idolcomplex" root = "https://idol.sankakucomplex.com" cookies_domain = "idol.sankakucomplex.com" cookies_names = ("_idolcomplex_session",) referer = False request_interval = (3.0, 6.0) def __init__(self, match): SankakuExtractor.__init__(self, match) self.logged_in = True self.start_page = 1 self.start_post = 0 def _init(self): self.find_pids = re.compile( r" href=[\"#]/\w\w/posts/(\w+)" ).findall self.find_tags = re.compile( r'tag-type-([^"]+)">\s*
    ]*?href="/[^?]*\?tags=([^"]+)' ).findall def items(self): self.login() data = self.metadata() for post_id in util.advance(self.post_ids(), self.start_post): post = self._extract_post(post_id) url = post["file_url"] post.update(data) text.nameext_from_url(url, post) yield Message.Directory, post yield Message.Url, url, post def skip(self, num): self.start_post += num return num def post_ids(self): """Return an iterable containing all relevant post ids""" def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) self.logged_in = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/users/login" page = self.request(url).text headers = { "Referer": url, } url = self.root + (text.extr(page, '") vcnt = extr('>Votes:', "<") pid = extr(">Post ID:", "<") created = extr(' title="', '"') if file_url := extr('>Original:', 'id='): file_url = extr(' href="', '"') width = extr(">", "x") height = extr("", " ") else: width = extr('') file_url = extr('Rating:", "') for tag_type, tag_name in self.find_tags(tags_html or ""): tags[tag_type].append(text.unquote(tag_name)) for key, value in tags.items(): data["tags_" + key] = " ".join(value) tags_list += value data["tags"] = " ".join(tags_list) return data class IdolcomplexTagExtractor(IdolcomplexExtractor): """Extractor for images from idol.sankakucomplex.com by search-tags""" subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/(?:posts/?)?\?([^#]*)" example = "https://idol.sankakucomplex.com/en/posts?tags=TAGS" per_page = 20 def __init__(self, match): IdolcomplexExtractor.__init__(self, match) query = text.parse_query(match[1]) self.tags = text.unquote(query.get("tags", "").replace("+", " ")) self.start_page = text.parse_int(query.get("page"), 1) self.next = text.parse_int(query.get("next"), 0) def skip(self, num): if self.next: self.start_post += num else: pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): if not self.next: max_page = 50 if self.logged_in else 25 if self.start_page > max_page: self.log.info("Traversing from page %d to page %d", max_page, self.start_page) self.start_post += self.per_page * (self.start_page - max_page) self.start_page = max_page tags = self.tags.split() if not self.logged_in and len(tags) > 4: raise exception.AbortExtraction( "Non-members can only search up to 4 tags at once") return {"search_tags": " ".join(tags)} def post_ids(self): url = self.root + "/en/posts" params = {"auto_page": "t"} if self.next: params["next"] = self.next else: params["page"] = self.start_page params["tags"] = self.tags while True: response = self.request(url, params=params, retries=10) if response.history and "/posts/premium" in response.url: self.log.warning("HTTP redirect to %s", response.url) page = response.text yield from text.extract_iter(page, '"id":"', '"') next_page_url = text.extr(page, 'next-page-url="', '"') if not next_page_url: return url, _, next_params = text.unquote( text.unescape(text.unescape(next_page_url))).partition("?") next_params = text.parse_query(next_params) if "next" in next_params: # stop if the same "next" value occurs twice in a row (#265) if "next" in params and params["next"] == next_params["next"]: return next_params["page"] = "2" if url[0] == "/": url = self.root + url params = next_params class IdolcomplexPoolExtractor(IdolcomplexExtractor): """Extractor for image-pools from idol.sankakucomplex.com""" subcategory = "pool" directory_fmt = ("{category}", "pool", "{pool}") archive_fmt = "p_{pool}_{id}" pattern = BASE_PATTERN + r"/pools?/(?:show/)?(\w+)" example = "https://idol.sankakucomplex.com/pools/0123456789abcdef" per_page = 24 def skip(self, num): pages, posts = divmod(num, self.per_page) self.start_page += pages self.start_post += posts return num def metadata(self): return {"pool": self.groups[0]} def post_ids(self): if not self.logged_in: self.log.warning("Login required") url = self.root + "/pools/show/" + self.groups[0] params = {"page": self.start_page} while True: page = self.request(url, params=params, retries=10).text pos = page.find('id="pool-show"') + 1 post_ids = self.find_pids(page, pos) yield from post_ids if len(post_ids) < self.per_page: return params["page"] += 1 class IdolcomplexPostExtractor(IdolcomplexExtractor): """Extractor for single images from idol.sankakucomplex.com""" subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/posts?/(?:show/)?(\w+)" example = "https://idol.sankakucomplex.com/posts/0123456789abcdef" def post_ids(self): return (self.groups[0],) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/imagebam.py0000644000175000017500000000641615040344700020660 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagebam.com/""" from .common import Extractor, Message from .. import text, util class ImagebamExtractor(Extractor): """Base class for imagebam extractors""" category = "imagebam" root = "https://www.imagebam.com" def __init__(self, match): Extractor.__init__(self, match) self.path = match[1] def _init(self): self.cookies.set("nsfw_inter", "1", domain="www.imagebam.com") def _parse_image_page(self, path): page = self.request(self.root + path).text url, pos = text.extract(page, '', '<').strip())} def images(self, page): findall = util.re(r'Not Found" in page: raise exception.NotFoundError("gallery") self.files = () return {} self.files = post.pop("files", ()) post["gallery_id"] = self.gallery_id post["tags"] = [tag["name"] for tag in post["tags"]] return post def _metadata_api(self, page): post = self.api.post(self.gallery_id) post["date"] = text.parse_datetime( post["created"], "%Y-%m-%dT%H:%M:%S.%fZ") for img in post["images"]: img["date"] = text.parse_datetime( img["created"], "%Y-%m-%dT%H:%M:%S.%fZ") post["gallery_id"] = self.gallery_id post.pop("image_count", None) self.files = post.pop("images") return post def images(self, page): try: return [ (file["link"], file) for file in self.files ] except Exception: return () class ImagechestUserExtractor(Extractor): """Extractor for imgchest.com user profiles""" category = "imagechest" subcategory = "user" root = "https://imgchest.com" pattern = BASE_PATTERN + r"/u/([^/?#]+)" example = "https://imgchest.com/u/USER" def items(self): url = self.root + "/api/posts" params = { "page" : 1, "sort" : "new", "tag" : "", "q" : "", "username": text.unquote(self.groups[0]), "nsfw" : "true", } while True: try: data = self.request_json(url, params=params)["data"] except (TypeError, KeyError): return if not data: return for gallery in data: gallery["_extractor"] = ImagechestGalleryExtractor yield Message.Queue, gallery["link"], gallery params["page"] += 1 class ImagechestAPI(): """Interface for the Image Chest API https://imgchest.com/docs/api/1.0/general/overview """ root = "https://api.imgchest.com" def __init__(self, extractor, access_token): self.extractor = extractor self.headers = {"Authorization": "Bearer " + access_token} def file(self, file_id): endpoint = "/v1/file/" + file_id return self._call(endpoint) def post(self, post_id): endpoint = "/v1/post/" + post_id return self._call(endpoint) def user(self, username): endpoint = "/v1/user/" + username return self._call(endpoint) def _call(self, endpoint): url = self.root + endpoint while True: response = self.extractor.request( url, headers=self.headers, fatal=None, allow_redirects=False) if response.status_code < 300: return response.json()["data"] elif response.status_code < 400: raise exception.AuthenticationError("Invalid API access token") elif response.status_code == 429: self.extractor.wait(seconds=600) else: self.extractor.log.debug(response.text) raise exception.AbortExtraction("API request failed") ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/imagefap.py0000644000175000017500000002037415040344700020666 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.imagefap.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|beta\.)?imagefap\.com" class ImagefapExtractor(Extractor): """Base class for imagefap extractors""" category = "imagefap" root = "https://www.imagefap.com" directory_fmt = ("{category}", "{gallery_id} {title}") filename_fmt = ("{category}_{gallery_id}_{num:?/_/>04}" "{filename}.{extension}") archive_fmt = "{gallery_id}_{image_id}" request_interval = (2.0, 4.0) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and response.url.endswith("/human-verification"): if msg := text.extr(response.text, '
    ")[2].split()) raise exception.AbortExtraction(f"'{msg}'") self.log.warning("HTTP redirect to %s", response.url) return response class ImagefapGalleryExtractor(ImagefapExtractor): """Extractor for image galleries from imagefap.com""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery\.php\?gid=|gallery/|pictures/)(\d+)" example = "https://www.imagefap.com/gallery/12345" def __init__(self, match): ImagefapExtractor.__init__(self, match) self.gid = match[1] self.image_id = "" def items(self): url = f"{self.root}/gallery/{self.gid}" page = self.request(url).text data = self.get_job_metadata(page) yield Message.Directory, data for url, image in self.get_images(): data.update(image) yield Message.Url, url, data def get_job_metadata(self, page): """Collect metadata for extractor-job""" extr = text.extract_from(page) data = { "gallery_id": text.parse_int(self.gid), "uploader": extr("porn picture gallery by ", " to see hottest"), "title": text.unescape(extr("", "<")), "description": text.unescape(extr( 'id="gdesc_text"', '<').partition(">")[2]), "categories": text.split_html(extr( 'id="cnt_cats"', '</div>'))[1::2], "tags": text.split_html(extr( 'id="cnt_tags"', '</div>'))[1::2], "count": text.parse_int(extr(' 1 of ', ' pics"')), } self.image_id = extr('id="img_ed_', '"') self._count = data["count"] return data def get_images(self): """Collect image-urls and -metadata""" url = f"{self.root}/photo/{self.image_id}/" params = {"gid": self.gid, "idx": 0, "partial": "true"} headers = { "Content-Type": "application/x-www-form-urlencoded", "X-Requested-With": "XMLHttpRequest", "Referer": f"{url}?pgid=&gid={self.image_id}&page=0" } num = 0 total = self._count while True: page = self.request(url, params=params, headers=headers).text cnt = 0 for image_url in text.extract_iter(page, '<a href="', '"'): num += 1 cnt += 1 data = text.nameext_from_url(image_url) data["num"] = num data["image_id"] = text.parse_int(data["filename"]) yield image_url, data if not cnt or cnt < 24 and num >= total: return params["idx"] += cnt class ImagefapImageExtractor(ImagefapExtractor): """Extractor for single images from imagefap.com""" subcategory = "image" pattern = BASE_PATTERN + r"/photo/(\d+)" example = "https://www.imagefap.com/photo/12345" def __init__(self, match): ImagefapExtractor.__init__(self, match) self.image_id = match[1] def items(self): url, data = self.get_image() yield Message.Directory, data yield Message.Url, url, data def get_image(self): url = f"{self.root}/photo/{self.image_id}/" page = self.request(url).text url, pos = text.extract( page, 'original="', '"') image_id, pos = text.extract( page, 'id="imageid_input" value="', '"', pos) gallery_id, pos = text.extract( page, 'id="galleryid_input" value="', '"', pos) info = self._extract_jsonld(page) return url, text.nameext_from_url(url, { "title": text.unescape(info["name"]), "uploader": info["author"], "date": info["datePublished"], "width": text.parse_int(info["width"]), "height": text.parse_int(info["height"]), "gallery_id": text.parse_int(gallery_id), "image_id": text.parse_int(image_id), }) class ImagefapFolderExtractor(ImagefapExtractor): """Extractor for imagefap user folders""" subcategory = "folder" pattern = (BASE_PATTERN + r"/(?:organizer/|" r"(?:usergallery\.php\?user(id)?=([^&#]+)&" r"|profile/([^/?#]+)/galleries\?)folderid=)(\d+|-1)") example = "https://www.imagefap.com/organizer/12345" def __init__(self, match): ImagefapExtractor.__init__(self, match) self._id, user, profile, self.folder_id = match.groups() self.user = user or profile def items(self): for gallery_id, name, folder in self.galleries(self.folder_id): url = f"{self.root}/gallery/{gallery_id}" data = { "gallery_id": gallery_id, "title" : text.unescape(name), "folder" : text.unescape(folder), "_extractor": ImagefapGalleryExtractor, } yield Message.Queue, url, data def galleries(self, folder_id): """Yield gallery IDs and titles of a folder""" if folder_id == "-1": folder_name = "Uncategorized" if self._id: url = (f"{self.root}/usergallery.php" f"?userid={self.user}&folderid=-1") else: url = f"{self.root}/profile/{self.user}/galleries?folderid=-1" else: folder_name = None url = f"{self.root}/organizer/{folder_id}/" params = {"page": 0} extr = text.extract_from(self.request(url, params=params).text) if not folder_name: folder_name = extr("class'blk_galleries'><b>", "</b>") while True: cnt = 0 while True: gid = extr(' id="gid-', '"') if not gid: break yield gid, extr("<b>", "<"), folder_name cnt += 1 if cnt < 20: break params["page"] += 1 extr = text.extract_from(self.request(url, params=params).text) class ImagefapUserExtractor(ImagefapExtractor): """Extractor for an imagefap user profile""" subcategory = "user" pattern = (BASE_PATTERN + r"/(?:profile(?:\.php\?user=|/)([^/?#]+)(?:/galleries)?" r"|usergallery\.php\?userid=(\d+))(?:$|#)") example = "https://www.imagefap.com/profile/USER" def __init__(self, match): ImagefapExtractor.__init__(self, match) self.user, self.user_id = match.groups() def items(self): data = {"_extractor": ImagefapFolderExtractor} for folder_id in self.folders(): if folder_id == "-1": url = f"{self.root}/profile/{self.user}/galleries?folderid=-1" else: url = f"{self.root}/organizer/{folder_id}/" yield Message.Queue, url, data def folders(self): """Return a list of folder IDs of a user""" if self.user: url = f"{self.root}/profile/{self.user}/galleries" else: url = f"{self.root}/usergallery.php?userid={self.user_id}" response = self.request(url) self.user = response.url.split("/")[-2] folders = text.extr(response.text, ' id="tgl_all" value="', '"') return folders.rstrip("|").split("|") ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753461479.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/imagehosts.py������������������������������������������������0000644�0001750�0001750�00000032510�15040731347�021262� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Collection of extractors for various imagehosts""" from .common import Extractor, Message from .. import text, exception from ..cache import memcache from os.path import splitext class ImagehostImageExtractor(Extractor): """Base class for single-image extractors for various imagehosts""" basecategory = "imagehost" subcategory = "image" archive_fmt = "{token}" _https = True _params = None _cookies = None _encoding = None _validate = None def __init__(self, match): Extractor.__init__(self, match) self.page_url = f"http{'s' if self._https else ''}://{match[1]}" self.token = match[2] if self._params == "simple": self._params = { "imgContinue": "Continue+to+image+...+", } elif self._params == "complex": self._params = { "op": "view", "id": self.token, "pre": "1", "adb": "1", "next": "Continue+to+image+...+", } def items(self): page = self.request( self.page_url, method=("POST" if self._params else "GET"), data=self._params, cookies=self._cookies, encoding=self._encoding, ).text url, filename = self.get_info(page) data = text.nameext_from_url(filename, {"token": self.token}) data.update(self.metadata(page)) if self._https and url.startswith("http:"): url = "https:" + url[5:] if self._validate is not None: data["_http_validate"] = self._validate yield Message.Directory, data yield Message.Url, url, data def get_info(self, page): """Find image-url and string to get filename from""" def metadata(self, page): """Return additional metadata""" return () class ImxtoImageExtractor(ImagehostImageExtractor): """Extractor for single images from imx.to""" category = "imxto" pattern = (r"(?:https?://)?(?:www\.)?((?:imx\.to|img\.yt)" r"/(?:i/|img-)(\w+)(\.html)?)") example = "https://imx.to/i/ID" _params = "simple" _encoding = "utf-8" def __init__(self, match): ImagehostImageExtractor.__init__(self, match) if "/img-" in self.page_url: self.page_url = self.page_url.replace("img.yt", "imx.to") self.url_ext = True else: self.url_ext = False def get_info(self, page): url, pos = text.extract( page, '<div style="text-align:center;"><a href="', '"') if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, ' title="', '"', pos) if self.url_ext and filename: filename += splitext(url)[1] return url, filename or url def metadata(self, page): extr = text.extract_from(page, page.index("[ FILESIZE <")) size = extr(">", "</span>").replace(" ", "")[:-1] width, _, height = extr(">", " px</span>").partition("x") return { "size" : text.parse_bytes(size), "width" : text.parse_int(width), "height": text.parse_int(height), "hash" : extr(">", "</span>"), } class ImxtoGalleryExtractor(ImagehostImageExtractor): """Extractor for image galleries from imx.to""" category = "imxto" subcategory = "gallery" pattern = r"(?:https?://)?(?:www\.)?(imx\.to/g/([^/?#]+))" example = "https://imx.to/g/ID" def items(self): page = self.request(self.page_url).text title, pos = text.extract(page, '<div class="title', '<') data = { "_extractor": ImxtoImageExtractor, "title": text.unescape(title.partition(">")[2]).strip(), } for url in text.extract_iter(page, "<a href=", " ", pos): yield Message.Queue, url.strip("\"'"), data class AcidimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from acidimg.cc""" category = "acidimg" pattern = r"(?:https?://)?((?:www\.)?acidimg\.cc/img-([a-z0-9]+)\.html)" example = "https://acidimg.cc/img-abc123.html" _params = "simple" _encoding = "utf-8" def get_info(self, page): url, pos = text.extract(page, "<img class='centred' src='", "'") if not url: url, pos = text.extract(page, '<img class="centred" src="', '"') if not url: raise exception.NotFoundError("image") filename, pos = text.extract(page, "alt='", "'", pos) if not filename: filename, pos = text.extract(page, 'alt="', '"', pos) return url, (filename + splitext(url)[1]) if filename else url class ImagevenueImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagevenue.com""" category = "imagevenue" pattern = (r"(?:https?://)?((?:www|img\d+)\.imagevenue\.com" r"/([A-Z0-9]{8,10}|view/.*|img\.php\?.*))") example = "https://www.imagevenue.com/ME123456789" def get_info(self, page): pos = page.index('class="card-body') url, pos = text.extract(page, '<img src="', '"', pos) if url.endswith("/loader.svg"): url, pos = text.extract(page, '<img src="', '"', pos) filename, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(filename) def _validate(self, response): hget = response.headers.get return not ( hget("content-length") == "14396" and hget("content-type") == "image/jpeg" and hget("last-modified") == "Mon, 04 May 2020 07:19:52 GMT" ) class ImagetwistImageExtractor(ImagehostImageExtractor): """Extractor for single images from imagetwist.com""" category = "imagetwist" pattern = (r"(?:https?://)?((?:www\.|phun\.)?" r"image(?:twist|haha)\.com/([a-z0-9]{12}))") example = "https://imagetwist.com/123456abcdef/NAME.EXT" @property @memcache(maxage=3*3600) def _cookies(self): return self.request(self.page_url).cookies def get_info(self, page): url , pos = text.extract(page, '<img src="', '"') filename, pos = text.extract(page, ' alt="', '"', pos) return url, filename class ImagetwistGalleryExtractor(ImagehostImageExtractor): """Extractor for galleries from imagetwist.com""" category = "imagetwist" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.|phun\.)?" r"image(?:twist|haha)\.com/(p/[^/?#]+/\d+))") example = "https://imagetwist.com/p/USER/12345/NAME" def items(self): data = {"_extractor": ImagetwistImageExtractor} root = self.page_url[:self.page_url.find("/", 8)] page = self.request(self.page_url).text gallery = text.extr(page, 'class="gallerys', "</div") for path in text.extract_iter(gallery, ' href="', '"'): yield Message.Queue, root + path, data class ImgadultImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgadult.com""" category = "imgadult" _cookies = {"img_i_d": "1"} pattern = r"(?:https?://)?((?:www\.)?imgadult\.com/img-([0-9a-f]+)\.html)" example = "https://imgadult.com/img-0123456789abc.html" def get_info(self, page): url , pos = text.extract(page, "' src='", "'") name, pos = text.extract(page, "alt='", "'", pos) if name: name, _, rhs = name.rpartition(" image hosted at ImgAdult.com") if not name: name = rhs name = text.unescape(name) return url, name class ImgspiceImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgspice.com""" category = "imgspice" pattern = r"(?:https?://)?((?:www\.)?imgspice\.com/([^/?#]+))" example = "https://imgspice.com/ID/NAME.EXT.html" def get_info(self, page): pos = page.find('id="imgpreview"') if pos < 0: raise exception.NotFoundError("image") url , pos = text.extract(page, 'src="', '"', pos) name, pos = text.extract(page, 'alt="', '"', pos) return url, text.unescape(name) class PixhostImageExtractor(ImagehostImageExtractor): """Extractor for single images from pixhost.to""" category = "pixhost" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/show/\d+/(\d+)_[^/?#]+)") example = "https://pixhost.to/show/123/12345_NAME.EXT" _cookies = {"pixhostads": "1", "pixhosttest": "1"} def get_info(self, page): url , pos = text.extract(page, "class=\"image-img\" src=\"", "\"") filename, pos = text.extract(page, "alt=\"", "\"", pos) return url, filename class PixhostGalleryExtractor(ImagehostImageExtractor): """Extractor for image galleries from pixhost.to""" category = "pixhost" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.)?pixhost\.(?:to|org)" r"/gallery/([^/?#]+))") example = "https://pixhost.to/gallery/ID" def items(self): page = text.extr(self.request( self.page_url).text, 'class="images"', "</div>") data = {"_extractor": PixhostImageExtractor} for url in text.extract_iter(page, '<a href="', '"'): yield Message.Queue, url, data class PostimgImageExtractor(ImagehostImageExtractor): """Extractor for single images from postimages.org""" category = "postimg" pattern = (r"(?:https?://)?((?:www\.)?(?:postim(?:ages|g)|pixxxels)" r"\.(?:cc|org)/(?!gallery/)(?:image/)?([^/?#]+)/?)") example = "https://postimages.org/ID" def get_info(self, page): pos = page.index(' id="download"') url , pos = text.rextract(page, ' href="', '"', pos) filename, pos = text.extract(page, 'class="imagename">', '<', pos) return url, text.unescape(filename) class PostimgGalleryExtractor(ImagehostImageExtractor): """Extractor for images galleries from postimages.org""" category = "postimg" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.)?(?:postim(?:ages|g)|pixxxels)" r"\.(?:cc|org)/gallery/([^/?#]+))") example = "https://postimages.org/gallery/ID" def items(self): page = self.request(self.page_url).text data = {"_extractor": PostimgImageExtractor} for url in text.extract_iter(page, ' class="thumb"><a href="', '"'): yield Message.Queue, url, data class TurboimagehostImageExtractor(ImagehostImageExtractor): """Extractor for single images from www.turboimagehost.com""" category = "turboimagehost" pattern = (r"(?:https?://)?((?:www\.)?turboimagehost\.com" r"/p/(\d+)/[^/?#]+\.html)") example = "https://www.turboimagehost.com/p/12345/NAME.EXT.html" def get_info(self, page): url = text.extract(page, 'src="', '"', page.index("<img "))[0] return url, url class TurboimagehostGalleryExtractor(ImagehostImageExtractor): """Extractor for image galleries from turboimagehost.com""" category = "turboimagehost" subcategory = "gallery" pattern = (r"(?:https?://)?((?:www\.)?turboimagehost\.com" r"/album/(\d+)/([^/?#]*))") example = "https://www.turboimagehost.com/album/12345/GALLERY_NAME" def items(self): data = {"_extractor": TurboimagehostImageExtractor} params = {"p": 1} while True: page = self.request(self.page_url, params=params).text if params["p"] == 1 and \ "Requested gallery don`t exist on our website." in page: raise exception.NotFoundError("gallery") thumb_url = None for thumb_url in text.extract_iter(page, '"><a href="', '"'): yield Message.Queue, thumb_url, data if thumb_url is None: return params["p"] += 1 class ViprImageExtractor(ImagehostImageExtractor): """Extractor for single images from vipr.im""" category = "vipr" pattern = r"(?:https?://)?(vipr\.im/(\w+))" example = "https://vipr.im/abc123.html" def get_info(self, page): url = text.extr(page, '<img src="', '"') return url, url class ImgclickImageExtractor(ImagehostImageExtractor): """Extractor for single images from imgclick.net""" category = "imgclick" pattern = r"(?:https?://)?((?:www\.)?imgclick\.net/([^/?#]+))" example = "http://imgclick.net/abc123/NAME.EXT.html" _https = False _params = "complex" def get_info(self, page): url , pos = text.extract(page, '<br><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) return url, filename class FappicImageExtractor(ImagehostImageExtractor): """Extractor for single images from fappic.com""" category = "fappic" pattern = r"(?:https?://)?((?:www\.)?fappic\.com/(\w+)/[^/?#]+)" example = "https://fappic.com/abc123/NAME.EXT" def get_info(self, page): url , pos = text.extract(page, '<a href="#"><img src="', '"') filename, pos = text.extract(page, 'alt="', '"', pos) if filename.startswith("Porn-Picture-"): filename = filename[13:] return url, filename ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/imgbb.py�����������������������������������������������������0000644�0001750�0001750�00000016717�15040344700�020203� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgbb.com/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache class ImgbbExtractor(Extractor): """Base class for imgbb extractors""" category = "imgbb" directory_fmt = ("{category}", "{user}") filename_fmt = "{title} {id}.{extension}" archive_fmt = "{id}" root = "https://imgbb.com" def __init__(self, match): Extractor.__init__(self, match) self.page_url = self.sort = None def items(self): self.login() url = self.page_url params = {"sort": self.sort} while True: response = self.request(url, params=params, allow_redirects=False) if response.status_code < 300: break url = response.headers["location"] if url.startswith(self.root): raise exception.NotFoundError(self.subcategory) page = response.text data = self.metadata(page) first = True for img in self.images(page): image = { "id" : img["url_viewer"].rpartition("/")[2], "user" : img["user"]["username"] if "user" in img else "", "title" : text.unescape(img["title"]), "url" : img["image"]["url"], "extension": img["image"]["extension"], "size" : text.parse_int(img["image"]["size"]), "width" : text.parse_int(img["width"]), "height" : text.parse_int(img["height"]), } image.update(data) if first: first = False yield Message.Directory, data yield Message.Url, image["url"], image def login(self): username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=365*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extr(page, 'PF.obj.config.auth_token="', '"') headers = {"Referer": url} data = { "auth_token" : token, "login-subject": username, "password" : password, } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return self.cookies def _extract_resource(self, page): return util.json_loads(text.extr( page, "CHV.obj.resource=", "};") + "}") def _extract_user(self, page): return self._extract_resource(page).get("user") or {} def _pagination(self, page, endpoint, params): data = None seek, pos = text.extract(page, 'data-seek="', '"') tokn, pos = text.extract(page, 'PF.obj.config.auth_token="', '"', pos) params["action"] = "list" params["list"] = "images" params["sort"] = self.sort params["seek"] = seek params["page"] = 2 params["auth_token"] = tokn while True: for img in text.extract_iter(page, "data-object='", "'"): yield util.json_loads(text.unquote(img)) if data: if not data["seekEnd"] or params["seek"] == data["seekEnd"]: return params["seek"] = data["seekEnd"] params["page"] += 1 elif not seek or 'class="pagination-next"' not in page: return data = self.request_json(endpoint, method="POST", data=params) page = data["html"] class ImgbbAlbumExtractor(ImgbbExtractor): """Extractor for albums on imgbb.com""" subcategory = "album" directory_fmt = ("{category}", "{user}", "{album_name} {album_id}") pattern = r"(?:https?://)?ibb\.co/album/([^/?#]+)/?(?:\?([^#]+))?" example = "https://ibb.co/album/ID" def __init__(self, match): ImgbbExtractor.__init__(self, match) self.album_name = None self.album_id = match[1] self.sort = text.parse_query(match[2]).get("sort", "date_desc") self.page_url = "https://ibb.co/album/" + self.album_id def metadata(self, page): album = text.extr(page, '"og:title" content="', '"') user = self._extract_user(page) return { "album_id" : self.album_id, "album_name" : text.unescape(album), "user" : user.get("username") or "", "user_id" : user.get("id") or "", "displayname": user.get("name") or "", } def images(self, page): url = text.extr(page, '"og:url" content="', '"') album_id = url.rpartition("/")[2].partition("?")[0] return self._pagination(page, "https://ibb.co/json", { "from" : "album", "albumid" : album_id, "params_hidden[list]" : "images", "params_hidden[from]" : "album", "params_hidden[albumid]": album_id, }) class ImgbbUserExtractor(ImgbbExtractor): """Extractor for user profiles in imgbb.com""" subcategory = "user" pattern = r"(?:https?://)?([\w-]+)\.imgbb\.com/?(?:\?([^#]+))?$" example = "https://USER.imgbb.com" def __init__(self, match): ImgbbExtractor.__init__(self, match) self.user = match[1] self.sort = text.parse_query(match[2]).get("sort", "date_desc") self.page_url = f"https://{self.user}.imgbb.com/" def metadata(self, page): user = self._extract_user(page) return { "user" : user.get("username") or self.user, "user_id" : user.get("id") or "", "displayname": user.get("name") or "", } def images(self, page): user = text.extr(page, '.obj.resource={"id":"', '"') return self._pagination(page, self.page_url + "json", { "from" : "user", "userid" : user, "params_hidden[userid]": user, "params_hidden[from]" : "user", }) class ImgbbImageExtractor(ImgbbExtractor): subcategory = "image" pattern = r"(?:https?://)?ibb\.co/(?!album/)([^/?#]+)" example = "https://ibb.co/ID" def __init__(self, match): ImgbbExtractor.__init__(self, match) self.image_id = match[1] def items(self): url = "https://ibb.co/" + self.image_id page = self.request(url).text extr = text.extract_from(page) user = self._extract_user(page) image = { "id" : self.image_id, "title" : text.unescape(extr( '"og:title" content="', ' hosted at ImgBB"')), "url" : extr('"og:image" content="', '"'), "width" : text.parse_int(extr('"og:image:width" content="', '"')), "height": text.parse_int(extr('"og:image:height" content="', '"')), "user" : user.get("username") or "", "user_id" : user.get("id") or "", "displayname": user.get("name") or "", } image["extension"] = text.ext_from_url(image["url"]) yield Message.Directory, image yield Message.Url, image["url"], image �������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/imgbox.py����������������������������������������������������0000644�0001750�0001750�00000006756�15040344700�020412� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2014-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgbox.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, util, exception class ImgboxExtractor(Extractor): """Base class for imgbox extractors""" category = "imgbox" root = "https://imgbox.com" def items(self): data = self.get_job_metadata() yield Message.Directory, data for image_key in self.get_image_keys(): imgpage = self.request(self.root + "/" + image_key).text imgdata = self.get_image_metadata(imgpage) if imgdata["filename"]: imgdata.update(data) imgdata["image_key"] = image_key text.nameext_from_url(imgdata["filename"], imgdata) yield Message.Url, self.get_image_url(imgpage), imgdata def get_job_metadata(self): """Collect metadata for extractor-job""" return {} def get_image_keys(self): """Return an iterable containing all image-keys""" return [] def get_image_metadata(self, page): """Collect metadata for a downloadable file""" return text.extract_all(page, ( ("num" , '</a>   ', ' of '), (None , 'class="image-container"', ''), ("filename" , ' title="', '"'), ))[0] def get_image_url(self, page): """Extract download-url""" return text.extr(page, 'property="og:image" content="', '"') class ImgboxGalleryExtractor(AsynchronousMixin, ImgboxExtractor): """Extractor for image galleries from imgbox.com""" subcategory = "gallery" directory_fmt = ("{category}", "{title} - {gallery_key}") filename_fmt = "{num:>03}-{filename}.{extension}" archive_fmt = "{gallery_key}_{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/g/([A-Za-z0-9]{10})" example = "https://imgbox.com/g/12345abcde" def __init__(self, match): ImgboxExtractor.__init__(self, match) self.gallery_key = match[1] self.image_keys = [] def get_job_metadata(self): page = self.request(self.root + "/g/" + self.gallery_key).text if "The specified gallery could not be found." in page: raise exception.NotFoundError("gallery") self.image_keys = util.re( r'<a href="/([^"]+)"><img alt="').findall(page) title = text.extr(page, "<h1>", "</h1>") title, _, count = title.rpartition(" - ") return { "gallery_key": self.gallery_key, "title": text.unescape(title), "count": count[:-7], } def get_image_keys(self): return self.image_keys class ImgboxImageExtractor(ImgboxExtractor): """Extractor for single images from imgbox.com""" subcategory = "image" archive_fmt = "{image_key}" pattern = r"(?:https?://)?(?:www\.)?imgbox\.com/([A-Za-z0-9]{8})" example = "https://imgbox.com/1234abcd" def __init__(self, match): ImgboxExtractor.__init__(self, match) self.image_key = match[1] def get_image_keys(self): return (self.image_key,) def get_image_metadata(self, page): data = ImgboxExtractor.get_image_metadata(self, page) if not data["filename"]: raise exception.NotFoundError("image") return data ������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/imgth.py�����������������������������������������������������0000644�0001750�0001750�00000003565�15040344700�020230� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgth.com/""" from .common import GalleryExtractor from .. import text class ImgthGalleryExtractor(GalleryExtractor): """Extractor for image galleries from imgth.com""" category = "imgth" root = "https://imgth.com" pattern = r"(?:https?://)?(?:www\.)?imgth\.com/gallery/(\d+)" example = "https://imgth.com/gallery/123/TITLE" def __init__(self, match): self.gallery_id = gid = match[1] url = f"{self.root}/gallery/{gid}/g/" GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) return { "gallery_id": text.parse_int(self.gallery_id), "title": text.unescape(extr("<h1>", "</h1>")), "count": text.parse_int(extr( "total of images in this gallery: ", " ")), "date" : text.parse_datetime( extr("created on ", " by <") .replace("th, ", " ", 1).replace("nd, ", " ", 1) .replace("st, ", " ", 1), "%B %d %Y at %H:%M"), "user" : text.unescape(extr(">", "<")), } def images(self, page): pnum = 0 while True: thumbs = text.extr(page, '<ul class="thumbnails">', '</ul>') for url in text.extract_iter(thumbs, '<img src="', '"'): path = url.partition("/thumbs/")[2] yield (f"{self.root}/images/{path}", None) if '<li class="next">' not in page: return pnum += 1 url = f"{self.root}/gallery/{self.gallery_id}/g/page/{pnum}" page = self.request(url).text �������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/imgur.py�����������������������������������������������������0000644�0001750�0001750�00000024640�15040344700�020240� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imgur.com/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?(?:www\.|[im]\.)?imgur\.(?:com|io)" class ImgurExtractor(Extractor): """Base class for imgur extractors""" category = "imgur" root = "https://imgur.com" def __init__(self, match): Extractor.__init__(self, match) self.key = match[1] def _init(self): self.api = ImgurAPI(self) self.mp4 = self.config("mp4", True) def _prepare(self, image): image.update(image["metadata"]) del image["metadata"] if image["ext"] == "jpeg": image["ext"] = "jpg" elif image["is_animated"] and self.mp4 and image["ext"] == "gif": image["ext"] = "mp4" image["url"] = url = \ f"https://i.imgur.com/{image['id']}.{image['ext']}" image["date"] = text.parse_datetime(image["created_at"]) image["_http_validate"] = self._validate text.nameext_from_url(url, image) return url def _validate(self, response): return (not response.history or not response.url.endswith("/removed.png")) def _items_queue(self, items): album_ex = ImgurAlbumExtractor image_ex = ImgurImageExtractor for item in items: if item["is_album"]: url = "https://imgur.com/a/" + item["id"] item["_extractor"] = album_ex else: url = "https://imgur.com/" + item["id"] item["_extractor"] = image_ex yield Message.Queue, url, item class ImgurImageExtractor(ImgurExtractor): """Extractor for individual images on imgur.com""" subcategory = "image" filename_fmt = "{category}_{id}{title:?_//}.{extension}" archive_fmt = "{id}" pattern = (BASE_PATTERN + r"/(?!gallery|search)" r"(?:r/\w+/)?(?:[^/?#]+-)?(\w{7}|\w{5})[sbtmlh]?") example = "https://imgur.com/abcdefg" def items(self): image = self.api.image(self.key) try: del image["ad_url"] del image["ad_type"] except KeyError: pass image.update(image["media"][0]) del image["media"] url = self._prepare(image) yield Message.Directory, image yield Message.Url, url, image class ImgurAlbumExtractor(ImgurExtractor): """Extractor for imgur albums""" subcategory = "album" directory_fmt = ("{category}", "{album[id]}{album[title]:? - //}") filename_fmt = "{category}_{album[id]}_{num:>03}_{id}.{extension}" archive_fmt = "{album[id]}_{id}" pattern = BASE_PATTERN + r"/a/(?:[^/?#]+-)?(\w{7}|\w{5})" example = "https://imgur.com/a/abcde" def items(self): album = self.api.album(self.key) try: images = album["media"] except KeyError: return del album["media"] count = len(images) album["date"] = text.parse_datetime(album["created_at"]) try: del album["ad_url"] del album["ad_type"] except KeyError: pass for num, image in enumerate(images, 1): url = self._prepare(image) image["num"] = num image["count"] = count image["album"] = album yield Message.Directory, image yield Message.Url, url, image class ImgurGalleryExtractor(ImgurExtractor): """Extractor for imgur galleries""" subcategory = "gallery" pattern = BASE_PATTERN + r"/(?:gallery|t/\w+)/(?:[^/?#]+-)?(\w{7}|\w{5})" example = "https://imgur.com/gallery/abcde" def items(self): if self.api.gallery(self.key)["is_album"]: url = f"{self.root}/a/{self.key}" extr = ImgurAlbumExtractor else: url = f"{self.root}/{self.key}" extr = ImgurImageExtractor yield Message.Queue, url, {"_extractor": extr} class ImgurUserExtractor(ImgurExtractor): """Extractor for all images posted by a user""" subcategory = "user" pattern = (BASE_PATTERN + r"/user/(?!me(?:/|$|\?|#))" r"([^/?#]+)(?:/posts|/submitted)?/?$") example = "https://imgur.com/user/USER" def items(self): return self._items_queue(self.api.account_submissions(self.key)) class ImgurFavoriteExtractor(ImgurExtractor): """Extractor for a user's favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites/?$" example = "https://imgur.com/user/USER/favorites" def items(self): return self._items_queue(self.api.account_favorites(self.key)) class ImgurFavoriteFolderExtractor(ImgurExtractor): """Extractor for a user's favorites folder""" subcategory = "favorite-folder" pattern = BASE_PATTERN + r"/user/([^/?#]+)/favorites/folder/(\d+)" example = "https://imgur.com/user/USER/favorites/folder/12345/TITLE" def __init__(self, match): ImgurExtractor.__init__(self, match) self.folder_id = match[2] def items(self): return self._items_queue(self.api.account_favorites_folder( self.key, self.folder_id)) class ImgurMeExtractor(ImgurExtractor): """Extractor for your personal uploads""" subcategory = "me" pattern = BASE_PATTERN + r"/user/me(?:/posts)?(/hidden)?" example = "https://imgur.com/user/me" def items(self): if not self.cookies_check(("accesstoken",)): self.log.error("'accesstoken' cookie required") if self.groups[0]: posts = self.api.accounts_me_hiddenalbums() else: posts = self.api.accounts_me_allposts() return self._items_queue(posts) class ImgurSubredditExtractor(ImgurExtractor): """Extractor for a subreddits's imgur links""" subcategory = "subreddit" pattern = BASE_PATTERN + r"/r/([^/?#]+)/?$" example = "https://imgur.com/r/SUBREDDIT" def items(self): return self._items_queue(self.api.gallery_subreddit(self.key)) class ImgurTagExtractor(ImgurExtractor): """Extractor for imgur tag searches""" subcategory = "tag" pattern = BASE_PATTERN + r"/t/([^/?#]+)$" example = "https://imgur.com/t/TAG" def items(self): return self._items_queue(self.api.gallery_tag(self.key)) class ImgurSearchExtractor(ImgurExtractor): """Extractor for imgur search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(?:/[^?#]+)?/?\?q=([^&#]+)" example = "https://imgur.com/search?q=UERY" def items(self): key = text.unquote(self.key.replace("+", " ")) return self._items_queue(self.api.gallery_search(key)) class ImgurAPI(): """Interface for the Imgur API Ref: https://apidocs.imgur.com/ """ def __init__(self, extractor): self.extractor = extractor self.client_id = extractor.config("client-id") or "546c25a59c58ad7" self.headers = {"Authorization": "Client-ID " + self.client_id} def account_submissions(self, account): endpoint = f"/3/account/{account}/submissions" return self._pagination(endpoint) def account_favorites(self, account): endpoint = f"/3/account/{account}/gallery_favorites" return self._pagination(endpoint) def account_favorites_folder(self, account, folder_id): endpoint = f"/3/account/{account}/folders/{folder_id}/favorites" return self._pagination_v2(endpoint) def accounts_me_allposts(self): endpoint = "/post/v1/accounts/me/all_posts" params = { "include": "media,tags,account", "page" : 1, "sort" : "-created_at", } return self._pagination_v2(endpoint, params) def accounts_me_hiddenalbums(self): endpoint = "/post/v1/accounts/me/hidden_albums" params = { "include": "media,tags,account", "page" : 1, "sort" : "-created_at", } return self._pagination_v2(endpoint, params) def gallery_search(self, query): endpoint = "/3/gallery/search" params = {"q": query} return self._pagination(endpoint, params) def gallery_subreddit(self, subreddit): endpoint = f"/3/gallery/r/{subreddit}" return self._pagination(endpoint) def gallery_tag(self, tag): endpoint = f"/3/gallery/t/{tag}" return self._pagination(endpoint, key="items") def image(self, image_hash): endpoint = "/post/v1/media/" + image_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def album(self, album_hash): endpoint = "/post/v1/albums/" + album_hash params = {"include": "media,tags,account"} return self._call(endpoint, params) def gallery(self, gallery_hash): endpoint = "/post/v1/posts/" + gallery_hash return self._call(endpoint) def _call(self, endpoint, params=None, headers=None): while True: try: return self.extractor.request_json( "https://api.imgur.com" + endpoint, params=params, headers=(headers or self.headers)) except exception.HttpError as exc: if exc.status not in (403, 429) or \ b"capacity" not in exc.response.content: raise self.extractor.wait(seconds=600) def _pagination(self, endpoint, params=None, key=None): num = 0 while True: data = self._call(f"{endpoint}/{num}", params)["data"] if key: data = data[key] if not data: return yield from data num += 1 def _pagination_v2(self, endpoint, params=None, key=None): if params is None: params = {} params["client_id"] = self.client_id if "page" not in params: params["page"] = 0 if "sort" not in params: params["sort"] = "newest" headers = {"Origin": "https://imgur.com"} while True: data = self._call(endpoint, params, headers) if "data" in data: data = data["data"] if not data: return yield from data params["page"] += 1 ������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/imhentai.py��������������������������������������������������0000644�0001750�0001750�00000012273�15040344700�020712� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://imhentai.xxx/ and mirror sites""" from .common import GalleryExtractor, BaseExtractor, Message from .. import text, util class ImhentaiExtractor(BaseExtractor): basecategory = "IMHentai" def _pagination(self, url): prev = None base = self.root + "/gallery/" data = {"_extractor": ImhentaiGalleryExtractor} while True: page = self.request(url).text pos = page.find('class="ranking_list"') if pos >= 0: page = page[:pos] extr = text.extract_from(page) while True: gallery_id = extr('href="/gallery/', '"') if gallery_id == prev: continue if not gallery_id: break yield Message.Queue, base + gallery_id, data prev = gallery_id href = text.rextr(page, "class='page-link' href='", "'") if not href or href == "#": return if href[0] == "/": if href[1] == "/": href = "https:" + href else: href = self.root + href url = href BASE_PATTERN = ImhentaiExtractor.update({ "imhentai": { "root": "https://imhentai.xxx", "pattern": r"(?:www\.)?imhentai\.xxx", }, "hentaiera": { "root": "https://hentaiera.com", "pattern": r"(?:www\.)?hentaiera\.com", }, "hentairox": { "root": "https://hentairox.com", "pattern": r"(?:www\.)?hentairox\.com", }, "hentaifox": { "root": "https://hentaifox.com", "pattern": r"(?:www\.)?hentaifox\.com", }, "hentaienvy": { "root": "https://hentaienvy.com", "pattern": r"(?:www\.)?hentaienvy\.com", }, "hentaizap": { "root": "https://hentaizap.com", "pattern": r"(?:www\.)?hentaizap\.com", }, }) class ImhentaiGalleryExtractor(ImhentaiExtractor, GalleryExtractor): """Extractor for imhentai galleries""" pattern = BASE_PATTERN + r"/(?:gallery|view)/(\d+)" example = "https://imhentai.xxx/gallery/12345/" def __init__(self, match): ImhentaiExtractor.__init__(self, match) self.gallery_id = self.groups[-1] self.page_url = f"{self.root}/gallery/{self.gallery_id}/" def metadata(self, page): extr = text.extract_from(page) title = extr("<h1>", "<") title_alt = extr('class="subtitle">', "<") end = "</li>" if extr('<ul class="galleries_info', ">") else "</ul>" data = { "gallery_id": text.parse_int(self.gallery_id), "title" : text.unescape(title), "title_alt" : text.unescape(title_alt), "parody" : self._split(extr(">Parodies", end)), "character" : self._split(extr(">Characters", end)), "tags" : self._split(extr(">Tags", end)), "artist" : self._split(extr(">Artists", end)), "group" : self._split(extr(">Groups", end)), "language" : self._split(extr(">Languages", end)), "type" : extr("href='/category/", "/"), } if data["language"]: data["lang"] = util.language_to_code(data["language"][0]) return data def _split(self, html): results = [] for tag in text.extract_iter(html, ">", "</a>"): badge = ("badge'>" in tag or "class='badge" in tag) tag = text.remove_html(tag) if badge: tag = tag.rpartition(" ")[0] results.append(tag) results.sort() return results def images(self, page): data = util.json_loads(text.extr(page, "$.parseJSON('", "'")) base = text.extr(page, 'data-src="', '"').rpartition("/")[0] + "/" exts = {"j": "jpg", "p": "png", "g": "gif", "w": "webp", "a": "avif"} results = [] for i in map(str, range(1, len(data)+1)): ext, width, height = data[i].split(",") url = base + i + "." + exts[ext] results.append((url, { "width" : text.parse_int(width), "height": text.parse_int(height), })) return results class ImhentaiTagExtractor(ImhentaiExtractor): """Extractor for imhentai tag searches""" subcategory = "tag" pattern = (BASE_PATTERN + r"(/(?:" r"artist|category|character|group|language|parody|tag" r")/([^/?#]+))") example = "https://imhentai.xxx/tag/TAG/" def items(self): url = self.root + self.groups[-2] + "/" return self._pagination(url) class ImhentaiSearchExtractor(ImhentaiExtractor): """Extractor for imhentai search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search(/?\?[^#]+|/[^/?#]+/?)" example = "https://imhentai.xxx/search/?key=QUERY" def items(self): url = self.root + "/search" + self.groups[-1] return self._pagination(url) �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/inkbunny.py��������������������������������������������������0000644�0001750�0001750�00000031070�15040344700�020745� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://inkbunny.net/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache BASE_PATTERN = r"(?:https?://)?(?:www\.)?inkbunny\.net" class InkbunnyExtractor(Extractor): """Base class for inkbunny extractors""" category = "inkbunny" directory_fmt = ("{category}", "{username!l}") filename_fmt = "{submission_id} {file_id} {title}.{extension}" archive_fmt = "{file_id}" root = "https://inkbunny.net" def _init(self): self.api = InkbunnyAPI(self) def items(self): self.api.authenticate() metadata = self.metadata() to_bool = ("deleted", "favorite", "friends_only", "guest_block", "hidden", "public", "scraps") for post in self.posts(): post.update(metadata) post["date"] = text.parse_datetime( post["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") post["tags"] = [kw["keyword_name"] for kw in post["keywords"]] post["ratings"] = [r["name"] for r in post["ratings"]] files = post["files"] for key in to_bool: if key in post: post[key] = (post[key] == "t") del post["keywords"] del post["files"] yield Message.Directory, post for post["num"], file in enumerate(files, 1): post.update(file) post["deleted"] = (file["deleted"] == "t") post["date"] = text.parse_datetime( file["create_datetime"] + "00", "%Y-%m-%d %H:%M:%S.%f%z") text.nameext_from_url(file["file_name"], post) url = file["file_url_full"] if "/private_files/" in url: url += "?sid=" + self.api.session_id yield Message.Url, url, post def posts(self): return () def metadata(self): return () class InkbunnyUserExtractor(InkbunnyExtractor): """Extractor for inkbunny user profiles""" subcategory = "user" pattern = BASE_PATTERN + r"/(?!s/)(gallery/|scraps/)?(\w+)(?:$|[/?#])" example = "https://inkbunny.net/USER" def __init__(self, match): kind, self.user = match.groups() if not kind: self.scraps = None elif kind[0] == "g": self.subcategory = "gallery" self.scraps = "no" else: self.subcategory = "scraps" self.scraps = "only" InkbunnyExtractor.__init__(self, match) def posts(self): orderby = self.config("orderby") params = { "username": self.user, "scraps" : self.scraps, "orderby" : orderby, } if orderby and orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnyPoolExtractor(InkbunnyExtractor): """Extractor for inkbunny pools""" subcategory = "pool" pattern = (BASE_PATTERN + r"/(?:" r"poolview_process\.php\?pool_id=(\d+)|" r"submissionsviewall\.php" r"\?((?:[^#]+&)?mode=pool(?:&[^#]+)?))") example = "https://inkbunny.net/poolview_process.php?pool_id=12345" def __init__(self, match): InkbunnyExtractor.__init__(self, match) if pid := match[1]: self.pool_id = pid self.orderby = "pool_order" else: params = text.parse_query(match[2]) self.pool_id = params.get("pool_id") self.orderby = params.get("orderby", "pool_order") def metadata(self): return {"pool_id": self.pool_id} def posts(self): params = { "pool_id": self.pool_id, "orderby": self.orderby, } return self.api.search(params) class InkbunnyFavoriteExtractor(InkbunnyExtractor): """Extractor for inkbunny user favorites""" subcategory = "favorite" directory_fmt = ("{category}", "{favs_username!l}", "Favorites") pattern = (BASE_PATTERN + r"/(?:" r"userfavorites_process\.php\?favs_user_id=(\d+)|" r"submissionsviewall\.php" r"\?((?:[^#]+&)?mode=userfavs(?:&[^#]+)?))") example = ("https://inkbunny.net/userfavorites_process.php" "?favs_user_id=12345") def __init__(self, match): InkbunnyExtractor.__init__(self, match) if uid := match[1]: self.user_id = uid self.orderby = self.config("orderby", "fav_datetime") else: params = text.parse_query(match[2]) self.user_id = params.get("user_id") self.orderby = params.get("orderby", "fav_datetime") def metadata(self): # Lookup fav user ID as username url = (f"{self.root}/userfavorites_process.php" f"?favs_user_id={self.user_id}") page = self.request(url).text user_link = text.extr(page, '<a rel="author"', '</a>') favs_username = text.extr(user_link, 'href="/', '"') return { "favs_user_id": self.user_id, "favs_username": favs_username, } def posts(self): params = { "favs_user_id": self.user_id, "orderby" : self.orderby, } if self.orderby and self.orderby.startswith("unread_"): params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnyUnreadExtractor(InkbunnyExtractor): """Extractor for unread inkbunny submissions""" subcategory = "unread" pattern = (BASE_PATTERN + r"/submissionsviewall\.php" r"\?((?:[^#]+&)?mode=unreadsubs(?:&[^#]+)?)") example = ("https://inkbunny.net/submissionsviewall.php" "?text=&mode=unreadsubs&type=") def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.params = text.parse_query(match[1]) def posts(self): params = self.params.copy() params.pop("rid", None) params.pop("mode", None) params["unread_submissions"] = "yes" return self.api.search(params) class InkbunnySearchExtractor(InkbunnyExtractor): """Extractor for inkbunny search results""" subcategory = "search" pattern = (BASE_PATTERN + r"/submissionsviewall\.php" r"\?((?:[^#]+&)?mode=search(?:&[^#]+)?)") example = ("https://inkbunny.net/submissionsviewall.php" "?text=TAG&mode=search&type=") def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.params = text.parse_query(match[1]) def metadata(self): return {"search": self.params} def posts(self): params = self.params.copy() pop = params.pop pop("rid", None) params["string_join_type"] = pop("stringtype", None) params["dayslimit"] = pop("days", None) params["username"] = pop("artist", None) if favsby := pop("favsby", None): # get user_id from user profile url = f"{self.root}/{favsby}" page = self.request(url).text user_id = text.extr(page, "?user_id=", "'") params["favs_user_id"] = user_id.partition("&")[0] return self.api.search(params) class InkbunnyFollowingExtractor(InkbunnyExtractor): """Extractor for inkbunny user watches""" subcategory = "following" pattern = (BASE_PATTERN + r"/(?:" r"watchlist_process\.php\?mode=watching&user_id=(\d+)|" r"usersviewall\.php" r"\?((?:[^#]+&)?mode=watching(?:&[^#]+)?))") example = ("https://inkbunny.net/watchlist_process.php" "?mode=watching&user_id=12345") def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.user_id = match[1] or \ text.parse_query(match[2]).get("user_id") def items(self): url = self.root + "/watchlist_process.php" params = {"mode": "watching", "user_id": self.user_id} with self.request(url, params=params) as response: url, _, params = response.url.partition("?") page = response.text params = text.parse_query(params) params["page"] = text.parse_int(params.get("page"), 1) data = {"_extractor": InkbunnyUserExtractor} while True: for user in text.extract_iter( page, '<a class="widget_userNameSmall" href="', '"', page.index('id="changethumboriginal_form"')): yield Message.Queue, self.root + user, data if "<a title='next page' " not in page: return params["page"] += 1 page = self.request(url, params=params).text class InkbunnyPostExtractor(InkbunnyExtractor): """Extractor for individual Inkbunny posts""" subcategory = "post" pattern = BASE_PATTERN + r"/s/(\d+)" example = "https://inkbunny.net/s/12345" def __init__(self, match): InkbunnyExtractor.__init__(self, match) self.submission_id = match[1] def posts(self): submissions = self.api.detail(({"submission_id": self.submission_id},)) if submissions[0] is None: raise exception.NotFoundError("submission") return submissions class InkbunnyAPI(): """Interface for the Inkunny API Ref: https://wiki.inkbunny.net/wiki/API """ def __init__(self, extractor): self.extractor = extractor self.session_id = None def detail(self, submissions): """Get full details about submissions with the given IDs""" ids = { sub["submission_id"]: idx for idx, sub in enumerate(submissions) } params = { "submission_ids": ",".join(ids), "show_description": "yes", "show_pools": "yes", } submissions = [None] * len(ids) for sub in self._call("submissions", params)["submissions"]: submissions[ids[sub["submission_id"]]] = sub return submissions def search(self, params): """Perform a search""" return self._pagination_search(params) def set_allowed_ratings(self, nudity=True, sexual=True, violence=True, strong_violence=True): """Change allowed submission ratings""" params = { "tag[2]": "yes" if nudity else "no", "tag[3]": "yes" if violence else "no", "tag[4]": "yes" if sexual else "no", "tag[5]": "yes" if strong_violence else "no", } self._call("userrating", params) def authenticate(self, invalidate=False): username, password = self.extractor._get_auth_info() if invalidate: _authenticate_impl.invalidate(username or "guest") if username: self.session_id = _authenticate_impl(self, username, password) else: self.session_id = _authenticate_impl(self, "guest", "") self.set_allowed_ratings() def _call(self, endpoint, params): url = "https://inkbunny.net/api_" + endpoint + ".php" while True: params["sid"] = self.session_id data = self.extractor.request_json(url, params=params) if "error_code" not in data: return data if str(data["error_code"]) == "2": self.authenticate(invalidate=True) continue raise exception.AbortExtraction(data.get("error_message")) def _pagination_search(self, params): params["page"] = 1 params["get_rid"] = "yes" params["submission_ids_only"] = "yes" while True: data = self._call("search", params) if not data["submissions"]: return yield from self.detail(data["submissions"]) if data["page"] >= data["pages_count"]: return if "get_rid" in params: del params["get_rid"] params["rid"] = data["rid"] params["page"] += 1 @cache(maxage=365*86400, keyarg=1) def _authenticate_impl(api, username, password): api.extractor.log.info("Logging in as %s", username) url = "https://inkbunny.net/api_login.php" data = {"username": username, "password": password} data = api.extractor.request_json(url, method="POST", data=data) if "sid" not in data: raise exception.AuthenticationError(data.get("error_message")) return data["sid"] ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/instagram.py�������������������������������������������������0000644�0001750�0001750�00000112525�15040344700�021102� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2020 Leonardo Taccari # Copyright 2018-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.instagram.com/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import cache, memcache import itertools import binascii BASE_PATTERN = r"(?:https?://)?(?:www\.)?instagram\.com" USER_PATTERN = BASE_PATTERN + r"/(?!(?:p|tv|reel|explore|stories)/)([^/?#]+)" class InstagramExtractor(Extractor): """Base class for instagram extractors""" category = "instagram" directory_fmt = ("{category}", "{username}") filename_fmt = "{sidecar_media_id:?/_/}{media_id}.{extension}" archive_fmt = "{media_id}" root = "https://www.instagram.com" cookies_domain = ".instagram.com" cookies_names = ("sessionid",) useragent = util.USERAGENT_CHROME request_interval = (6.0, 12.0) def __init__(self, match): Extractor.__init__(self, match) self.item = match[1] def _init(self): self.www_claim = "0" self.csrf_token = util.generate_token() self._find_tags = util.re(r"#\w+").findall self._logged_in = True self._cursor = None self._user = None self.cookies.set( "csrftoken", self.csrf_token, domain=self.cookies_domain) if self.config("api") == "graphql": self.api = InstagramGraphqlAPI(self) else: self.api = InstagramRestAPI(self) def items(self): self.login() data = self.metadata() if videos := self.config("videos", True): videos_dash = (videos != "merged") videos_headers = {"User-Agent": "Mozilla/5.0"} previews = self.config("previews", False) max_posts = self.config("max-posts") order = self.config("order-files") reverse = order[0] in ("r", "d") if order else False posts = self.posts() if max_posts: posts = itertools.islice(posts, max_posts) for post in posts: if "__typename" in post: post = self._parse_post_graphql(post) else: post = self._parse_post_rest(post) if self._user: post["user"] = self._user post.update(data) files = post.pop("_files") post["count"] = len(files) yield Message.Directory, post if "date" in post: del post["date"] if reverse: files.reverse() for file in files: file.update(post) if url := file.get("video_url"): if videos: file["_http_headers"] = videos_headers text.nameext_from_url(url, file) if videos_dash: file["_fallback"] = (url,) file["_ytdl_manifest"] = "dash" url = f"ytdl:{post['post_url']}{file['num']}.mp4" yield Message.Url, url, file if previews: file["media_id"] += "p" else: continue url = file["display_url"] text.nameext_from_url(url, file) if file["extension"] == "webp" and "stp=dst-jpg" in url: file["extension"] = "jpg" yield Message.Url, url, file def metadata(self): return () def posts(self): return () def finalize(self): if self._cursor: self.log.info("Use '-o cursor=%s' to continue downloading " "from the current position", self._cursor) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history: url = response.url if "/accounts/login/" in url: page = "login" elif "/challenge/" in url: page = "challenge" else: page = None if page is not None: raise exception.AbortExtraction( f"HTTP redirect to {page} page ({url.partition('?')[0]})") www_claim = response.headers.get("x-ig-set-www-claim") if www_claim is not None: self.www_claim = www_claim if csrf_token := response.cookies.get("csrftoken"): self.csrf_token = csrf_token return response def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(_login_impl(self, username, password)) self._logged_in = False def _parse_post_rest(self, post): if "items" in post: # story or highlight items = post["items"] reel_id = str(post["id"]).rpartition(":")[2] if expires := post.get("expiring_at"): post_url = f"{self.root}/stories/{post['user']['username']}/" else: post_url = f"{self.root}/stories/highlights/{reel_id}/" data = { "expires": text.parse_timestamp(expires), "post_id": reel_id, "post_shortcode": shortcode_from_id(reel_id), "post_url": post_url, } if "title" in post: data["highlight_title"] = post["title"] if expires and not post.get("seen"): post["seen"] = expires - 86400 else: # regular image/video post data = { "post_id" : post["pk"], "post_shortcode": post["code"], "post_url": f"{self.root}/p/{post['code']}/", "likes": post.get("like_count", 0), "liked": post.get("has_liked", False), "pinned": self._extract_pinned(post), } caption = post["caption"] data["description"] = caption["text"] if caption else "" if tags := self._find_tags(data["description"]): data["tags"] = sorted(set(tags)) if location := post.get("location"): slug = location["short_name"].replace(" ", "-").lower() data["location_id"] = location["pk"] data["location_slug"] = slug data["location_url"] = \ f"{self.root}/explore/locations/{location['pk']}/{slug}/" if coauthors := post.get("coauthor_producers"): data["coauthors"] = [ {"id" : user["pk"], "username" : user["username"], "full_name": user["full_name"]} for user in coauthors ] if items := post.get("carousel_media"): data["sidecar_media_id"] = data["post_id"] data["sidecar_shortcode"] = data["post_shortcode"] else: items = (post,) owner = post["user"] data["owner_id"] = owner["pk"] data["username"] = owner.get("username") data["fullname"] = owner.get("full_name") data["post_date"] = data["date"] = text.parse_timestamp( post.get("taken_at") or post.get("created_at") or post.get("seen")) data["_files"] = files = [] for num, item in enumerate(items, 1): try: image = item["image_versions2"]["candidates"][0] except Exception: self.log.warning("Missing media in post %s", data["post_shortcode"]) continue if video_versions := item.get("video_versions"): video = max( video_versions, key=lambda x: (x["width"], x["height"], x["type"]), ) media = video else: video = None media = image media = { "num" : num, "date" : text.parse_timestamp(item.get("taken_at") or media.get("taken_at") or post.get("taken_at")), "media_id" : item["pk"], "shortcode" : (item.get("code") or shortcode_from_id(item["pk"])), "display_url": image["url"], "video_url" : video["url"] if video else None, "width" : media["width"], "height" : media["height"], "_ytdl_manifest_data": item.get("video_dash_manifest"), } if "expiring_at" in item: media["expires"] = text.parse_timestamp(post["expiring_at"]) self._extract_tagged_users(item, media) files.append(media) return data def _parse_post_graphql(self, post): typename = post["__typename"] if self._logged_in: if post.get("is_video") and "video_url" not in post: post = self.api.media(post["id"])[0] elif typename == "GraphSidecar" and \ "edge_sidecar_to_children" not in post: post = self.api.media(post["id"])[0] if pinned := post.get("pinned_for_users", ()): for index, user in enumerate(pinned): pinned[index] = int(user["id"]) owner = post["owner"] data = { "typename" : typename, "likes" : post["edge_media_preview_like"]["count"], "liked" : post.get("viewer_has_liked", False), "pinned" : pinned, "owner_id" : owner["id"], "username" : owner.get("username"), "fullname" : owner.get("full_name"), "post_id" : post["id"], "post_shortcode": post["shortcode"], "post_url" : f"{self.root}/p/{post['shortcode']}/", "post_date" : text.parse_timestamp(post["taken_at_timestamp"]), "description": text.parse_unicode_escapes("\n".join( edge["node"]["text"] for edge in post["edge_media_to_caption"]["edges"] )), } data["date"] = data["post_date"] if tags := self._find_tags(data["description"]): data["tags"] = sorted(set(tags)) if location := post.get("location"): data["location_id"] = location["id"] data["location_slug"] = location["slug"] data["location_url"] = (f"{self.root}/explore/locations/" f"{location['id']}/{location['slug']}/") if coauthors := post.get("coauthor_producers"): data["coauthors"] = [ {"id" : user["id"], "username": user["username"]} for user in coauthors ] data["_files"] = files = [] if "edge_sidecar_to_children" in post: for num, edge in enumerate( post["edge_sidecar_to_children"]["edges"], 1): node = edge["node"] dimensions = node["dimensions"] media = { "num": num, "media_id" : node["id"], "date" : data["date"], "shortcode" : (node.get("shortcode") or shortcode_from_id(node["id"])), "display_url": node["display_url"], "video_url" : node.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], "sidecar_media_id" : post["id"], "sidecar_shortcode": post["shortcode"], } self._extract_tagged_users(node, media) files.append(media) else: dimensions = post["dimensions"] media = { "media_id" : post["id"], "date" : data["date"], "shortcode" : post["shortcode"], "display_url": post["display_url"], "video_url" : post.get("video_url"), "width" : dimensions["width"], "height" : dimensions["height"], } self._extract_tagged_users(post, media) files.append(media) return data def _extract_tagged_users(self, src, dest): dest["tagged_users"] = tagged_users = [] if edges := src.get("edge_media_to_tagged_user"): for edge in edges["edges"]: user = edge["node"]["user"] tagged_users.append({"id" : user["id"], "username" : user["username"], "full_name": user["full_name"]}) if usertags := src.get("usertags"): for tag in usertags["in"]: user = tag["user"] tagged_users.append({"id" : user["pk"], "username" : user["username"], "full_name": user["full_name"]}) if mentions := src.get("reel_mentions"): for mention in mentions: user = mention["user"] tagged_users.append({"id" : user.get("pk"), "username" : user["username"], "full_name": user["full_name"]}) if stickers := src.get("story_bloks_stickers"): for sticker in stickers: sticker = sticker["bloks_sticker"] if sticker["bloks_sticker_type"] == "mention": user = sticker["sticker_data"]["ig_mention"] tagged_users.append({"id" : user["account_id"], "username" : user["username"], "full_name": user["full_name"]}) def _extract_pinned(self, post): return (post.get("timeline_pinned_user_ids") or post.get("clips_tab_pinned_user_ids") or ()) def _init_cursor(self): cursor = self.config("cursor", True) if cursor is True: return None elif not cursor: self._update_cursor = util.identity return cursor def _update_cursor(self, cursor): self.log.debug("Cursor: %s", cursor) self._cursor = cursor return cursor def _assign_user(self, user): self._user = user for key, old in ( ("count_media" , "edge_owner_to_timeline_media"), ("count_video" , "edge_felix_video_timeline"), ("count_saved" , "edge_saved_media"), ("count_mutual" , "edge_mutual_followed_by"), ("count_follow" , "edge_follow"), ("count_followed" , "edge_followed_by"), ("count_collection", "edge_media_collections")): try: user[key] = user.pop(old)["count"] except Exception: user[key] = 0 class InstagramUserExtractor(Dispatch, InstagramExtractor): """Extractor for an Instagram user profile""" pattern = USER_PATTERN + r"/?(?:$|[?#])" example = "https://www.instagram.com/USER/" def items(self): base = f"{self.root}/{self.item}/" stories = f"{self.root}/stories/{self.item}/" return self._dispatch_extractors(( (InstagramInfoExtractor , base + "info/"), (InstagramAvatarExtractor , base + "avatar/"), (InstagramStoriesExtractor , stories), (InstagramHighlightsExtractor, base + "highlights/"), (InstagramPostsExtractor , base + "posts/"), (InstagramReelsExtractor , base + "reels/"), (InstagramTaggedExtractor , base + "tagged/"), ), ("posts",)) class InstagramPostsExtractor(InstagramExtractor): """Extractor for an Instagram user's posts""" subcategory = "posts" pattern = USER_PATTERN + r"/posts" example = "https://www.instagram.com/USER/posts/" def posts(self): uid = self.api.user_id(self.item) return self.api.user_feed(uid) def _extract_pinned(self, post): try: return post["timeline_pinned_user_ids"] except KeyError: return () class InstagramReelsExtractor(InstagramExtractor): """Extractor for an Instagram user's reels""" subcategory = "reels" pattern = USER_PATTERN + r"/reels" example = "https://www.instagram.com/USER/reels/" def posts(self): uid = self.api.user_id(self.item) return self.api.user_clips(uid) def _extract_pinned(self, post): try: return post["clips_tab_pinned_user_ids"] except KeyError: return () class InstagramTaggedExtractor(InstagramExtractor): """Extractor for an Instagram user's tagged posts""" subcategory = "tagged" pattern = USER_PATTERN + r"/tagged" example = "https://www.instagram.com/USER/tagged/" def metadata(self): if self.item.startswith("id:"): self.user_id = self.item[3:] return {"tagged_owner_id": self.user_id} self.user_id = self.api.user_id(self.item) user = self.api.user_by_name(self.item) return { "tagged_owner_id" : user["id"], "tagged_username" : user["username"], "tagged_full_name": user["full_name"], } def posts(self): return self.api.user_tagged(self.user_id) class InstagramGuideExtractor(InstagramExtractor): """Extractor for an Instagram guide""" subcategory = "guide" pattern = USER_PATTERN + r"/guide/[^/?#]+/(\d+)" example = "https://www.instagram.com/USER/guide/NAME/12345" def __init__(self, match): InstagramExtractor.__init__(self, match) self.guide_id = match[2] def metadata(self): return {"guide": self.api.guide(self.guide_id)} def posts(self): return self.api.guide_media(self.guide_id) class InstagramSavedExtractor(InstagramExtractor): """Extractor for an Instagram user's saved media""" subcategory = "saved" pattern = USER_PATTERN + r"/saved(?:/all-posts)?/?$" example = "https://www.instagram.com/USER/saved/" def posts(self): return self.api.user_saved() class InstagramCollectionExtractor(InstagramExtractor): """Extractor for Instagram collection""" subcategory = "collection" pattern = USER_PATTERN + r"/saved/([^/?#]+)/([^/?#]+)" example = "https://www.instagram.com/USER/saved/COLLECTION/12345" def __init__(self, match): InstagramExtractor.__init__(self, match) self.user, self.collection_name, self.collection_id = match.groups() def metadata(self): return { "collection_id" : self.collection_id, "collection_name": text.unescape(self.collection_name), } def posts(self): return self.api.user_collection(self.collection_id) class InstagramStoriesExtractor(InstagramExtractor): """Extractor for Instagram stories""" subcategory = "stories" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/s(?:tories/(?:highlights/(\d+)|([^/?#]+)(?:/(\d+))?)" r"|/(aGlnaGxpZ2h0[^?#]+)(?:\?story_media_id=(\d+))?)") example = "https://www.instagram.com/stories/USER/" def __init__(self, match): h1, self.user, m1, h2, m2 = match.groups() if self.user: self.highlight_id = None else: self.subcategory = InstagramHighlightsExtractor.subcategory self.highlight_id = ("highlight:" + h1 if h1 else binascii.a2b_base64(h2).decode()) self.media_id = m1 or m2 InstagramExtractor.__init__(self, match) def posts(self): reel_id = self.highlight_id or self.api.user_id(self.user) reels = self.api.reels_media(reel_id) if not reels: return () if self.media_id: reel = reels[0] for item in reel["items"]: if item["pk"] == self.media_id: reel["items"] = (item,) break else: raise exception.NotFoundError("story") elif self.config("split"): reel = reels[0] reels = [] for item in reel["items"]: item.pop("user", None) copy = reel.copy() copy.update(item) copy["items"] = (item,) reels.append(copy) return reels class InstagramHighlightsExtractor(InstagramExtractor): """Extractor for an Instagram user's story highlights""" subcategory = "highlights" pattern = USER_PATTERN + r"/highlights" example = "https://www.instagram.com/USER/highlights/" def posts(self): uid = self.api.user_id(self.item) return self.api.highlights_media(uid) class InstagramFollowersExtractor(InstagramExtractor): """Extractor for an Instagram user's followers""" subcategory = "followers" pattern = USER_PATTERN + r"/followers" example = "https://www.instagram.com/USER/followers/" def items(self): uid = self.api.user_id(self.item) for user in self.api.user_followers(uid): user["_extractor"] = InstagramUserExtractor url = f"{self.root}/{user['username']}" yield Message.Queue, url, user class InstagramFollowingExtractor(InstagramExtractor): """Extractor for an Instagram user's followed users""" subcategory = "following" pattern = USER_PATTERN + r"/following" example = "https://www.instagram.com/USER/following/" def items(self): uid = self.api.user_id(self.item) for user in self.api.user_following(uid): user["_extractor"] = InstagramUserExtractor url = f"{self.root}/{user['username']}" yield Message.Queue, url, user class InstagramTagExtractor(InstagramExtractor): """Extractor for Instagram tags""" subcategory = "tag" directory_fmt = ("{category}", "{subcategory}", "{tag}") pattern = BASE_PATTERN + r"/explore/tags/([^/?#]+)" example = "https://www.instagram.com/explore/tags/TAG/" def metadata(self): return {"tag": text.unquote(self.item)} def posts(self): return self.api.tags_media(self.item) class InstagramInfoExtractor(InstagramExtractor): """Extractor for an Instagram user's profile data""" subcategory = "info" pattern = USER_PATTERN + r"/info" example = "https://www.instagram.com/USER/info/" def items(self): screen_name = self.item if screen_name.startswith("id:"): user = self.api.user_by_id(screen_name[3:]) else: user = self.api.user_by_name(screen_name) return iter(((Message.Directory, user),)) class InstagramAvatarExtractor(InstagramExtractor): """Extractor for an Instagram user's avatar""" subcategory = "avatar" pattern = USER_PATTERN + r"/avatar" example = "https://www.instagram.com/USER/avatar/" def posts(self): if self._logged_in: user_id = self.api.user_id(self.item, check_private=False) user = self.api.user_by_id(user_id) avatar = (user.get("hd_profile_pic_url_info") or user["hd_profile_pic_versions"][-1]) else: user = self.item if user.startswith("id:"): user = self.api.user_by_id(user[3:]) else: user = self.api.user_by_name(user) user["pk"] = user["id"] url = user.get("profile_pic_url_hd") or user["profile_pic_url"] avatar = {"url": url, "width": 0, "height": 0} if pk := user.get("profile_pic_id"): pk = pk.partition("_")[0] code = shortcode_from_id(pk) else: pk = code = "avatar:" + str(user["pk"]) return ({ "pk" : pk, "code" : code, "user" : user, "caption" : None, "like_count": 0, "image_versions2": {"candidates": (avatar,)}, },) class InstagramPostExtractor(InstagramExtractor): """Extractor for an Instagram post""" subcategory = "post" pattern = (r"(?:https?://)?(?:www\.)?instagram\.com" r"/(?:share/()|[^/?#]+/)?(?:p|tv|reel)/([^/?#]+)") example = "https://www.instagram.com/p/abcdefg/" def posts(self): share, shortcode = self.groups if share is not None: url = text.ensure_http_scheme(self.url) headers = { "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "navigate", "Sec-Fetch-Site": "same-origin", } location = self.request_location(url, headers=headers) shortcode = location.split("/")[-2] return self.api.media(shortcode) class InstagramRestAPI(): def __init__(self, extractor): self.extractor = extractor def guide(self, guide_id): endpoint = "/v1/guides/web_info/" params = {"guide_id": guide_id} return self._call(endpoint, params=params) def guide_media(self, guide_id): endpoint = f"/v1/guides/guide/{guide_id}/" return self._pagination_guides(endpoint) def highlights_media(self, user_id, chunk_size=5): reel_ids = [hl["id"] for hl in self.highlights_tray(user_id)] if order := self.extractor.config("order-posts"): if order in ("desc", "reverse"): reel_ids.reverse() elif order in ("id", "id_asc"): reel_ids.sort(key=lambda r: int(r[10:])) elif order == "id_desc": reel_ids.sort(key=lambda r: int(r[10:]), reverse=True) elif order != "asc": self.extractor.log.warning("Unknown posts order '%s'", order) for offset in range(0, len(reel_ids), chunk_size): yield from self.reels_media( reel_ids[offset : offset+chunk_size]) def highlights_tray(self, user_id): endpoint = f"/v1/highlights/{user_id}/highlights_tray/" return self._call(endpoint)["tray"] def media(self, shortcode): if len(shortcode) > 28: shortcode = shortcode[:-28] endpoint = f"/v1/media/{id_from_shortcode(shortcode)}/info/" return self._pagination(endpoint) def reels_media(self, reel_ids): endpoint = "/v1/feed/reels_media/" params = {"reel_ids": reel_ids} try: return self._call(endpoint, params=params)["reels_media"] except KeyError: raise exception.AuthorizationError("Login required") def tags_media(self, tag): for section in self.tags_sections(tag): for media in section["layout_content"]["medias"]: yield media["media"] def tags_sections(self, tag): endpoint = f"/v1/tags/{tag}/sections/" data = { "include_persistent": "0", "max_id" : None, "page" : None, "surface": "grid", "tab" : "recent", } return self._pagination_sections(endpoint, data) @memcache(keyarg=1) def user_by_name(self, screen_name): endpoint = "/v1/users/web_profile_info/" params = {"username": screen_name} return self._call( endpoint, params=params, notfound="user")["data"]["user"] @memcache(keyarg=1) def user_by_id(self, user_id): endpoint = f"/v1/users/{user_id}/info/" return self._call(endpoint)["user"] def user_id(self, screen_name, check_private=True): if screen_name.startswith("id:"): if self.extractor.config("metadata"): self.extractor._user = self.user_by_id(screen_name[3:]) return screen_name[3:] user = self.user_by_name(screen_name) if user is None: raise exception.AuthorizationError( "Login required to access this profile") if check_private and user["is_private"] and \ not user["followed_by_viewer"]: name = user["username"] s = "" if name.endswith("s") else "s" self.extractor.log.warning("%s'%s posts are private", name, s) self.extractor._assign_user(user) return user["id"] def user_clips(self, user_id): endpoint = "/v1/clips/user/" data = { "target_user_id": user_id, "page_size": "50", "max_id": None, "include_feed_video": "true", } return self._pagination_post(endpoint, data) def user_collection(self, collection_id): endpoint = f"/v1/feed/collection/{collection_id}/posts/" params = {"count": 50} return self._pagination(endpoint, params, media=True) def user_feed(self, user_id): endpoint = f"/v1/feed/user/{user_id}/" params = {"count": 30} return self._pagination(endpoint, params) def user_followers(self, user_id): endpoint = f"/v1/friendships/{user_id}/followers/" params = {"count": 12} return self._pagination_following(endpoint, params) def user_following(self, user_id): endpoint = f"/v1/friendships/{user_id}/following/" params = {"count": 12} return self._pagination_following(endpoint, params) def user_saved(self): endpoint = "/v1/feed/saved/posts/" params = {"count": 50} return self._pagination(endpoint, params, media=True) def user_tagged(self, user_id): endpoint = f"/v1/usertags/{user_id}/feed/" params = {"count": 20} return self._pagination(endpoint, params) def _call(self, endpoint, **kwargs): extr = self.extractor url = "https://www.instagram.com/api" + endpoint kwargs["headers"] = { "Accept" : "*/*", "X-CSRFToken" : extr.csrf_token, "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "129477", "X-IG-WWW-Claim" : extr.www_claim, "X-Requested-With": "XMLHttpRequest", "Connection" : "keep-alive", "Referer" : extr.root + "/", "Sec-Fetch-Dest" : "empty", "Sec-Fetch-Mode" : "cors", "Sec-Fetch-Site" : "same-origin", } return extr.request_json(url, **kwargs) def _pagination(self, endpoint, params=None, media=False): if params is None: params = {} extr = self.extractor params["max_id"] = extr._init_cursor() while True: data = self._call(endpoint, params=params) if media: for item in data["items"]: yield item["media"] else: yield from data["items"] if not data.get("more_available"): return extr._update_cursor(None) params["max_id"] = extr._update_cursor(data["next_max_id"]) def _pagination_post(self, endpoint, params): extr = self.extractor params["max_id"] = extr._init_cursor() while True: data = self._call(endpoint, method="POST", data=params) for item in data["items"]: yield item["media"] info = data["paging_info"] if not info.get("more_available"): return extr._update_cursor(None) params["max_id"] = extr._update_cursor(info["max_id"]) def _pagination_sections(self, endpoint, params): extr = self.extractor params["max_id"] = extr._init_cursor() while True: info = self._call(endpoint, method="POST", data=params) yield from info["sections"] if not info.get("more_available"): return extr._update_cursor(None) params["page"] = info["next_page"] params["max_id"] = extr._update_cursor(info["next_max_id"]) def _pagination_guides(self, endpoint): extr = self.extractor params = {"max_id": extr._init_cursor()} while True: data = self._call(endpoint, params=params) for item in data["items"]: yield from item["media_items"] next_max_id = data.get("next_max_id") if not next_max_id: return extr._update_cursor(None) params["max_id"] = extr._update_cursor(next_max_id) def _pagination_following(self, endpoint, params): extr = self.extractor params["max_id"] = text.parse_int(extr._init_cursor()) while True: data = self._call(endpoint, params=params) yield from data["users"] next_max_id = data.get("next_max_id") if not next_max_id: return extr._update_cursor(None) params["max_id"] = extr._update_cursor(next_max_id) class InstagramGraphqlAPI(): def __init__(self, extractor): self.extractor = extractor self.user_collection = self.user_saved = self.reels_media = \ self.highlights_media = self.guide = self.guide_media = \ self._unsupported self._json_dumps = util.json_dumps api = InstagramRestAPI(extractor) self.user_by_name = api.user_by_name self.user_by_id = api.user_by_id self.user_id = api.user_id def _unsupported(self, _=None): raise exception.AbortExtraction("Unsupported with GraphQL API") def highlights_tray(self, user_id): query_hash = "d4d88dc1500312af6f937f7b804c68c3" variables = { "user_id": user_id, "include_chaining": False, "include_reel": False, "include_suggested_users": False, "include_logged_out_extras": True, "include_highlight_reels": True, "include_live_status": False, } edges = (self._call(query_hash, variables)["user"] ["edge_highlight_reels"]["edges"]) return [edge["node"] for edge in edges] def media(self, shortcode): query_hash = "9f8827793ef34641b2fb195d4d41151c" variables = { "shortcode": shortcode, "child_comment_count": 3, "fetch_comment_count": 40, "parent_comment_count": 24, "has_threaded_comments": True, } media = self._call(query_hash, variables).get("shortcode_media") return (media,) if media else () def tags_media(self, tag): query_hash = "9b498c08113f1e09617a1703c22b2f32" variables = {"tag_name": text.unescape(tag), "first": 24} return self._pagination(query_hash, variables, "hashtag", "edge_hashtag_to_media") def user_clips(self, user_id): query_hash = "bc78b344a68ed16dd5d7f264681c4c76" variables = {"id": user_id, "first": 24} return self._pagination(query_hash, variables) def user_feed(self, user_id): query_hash = "69cba40317214236af40e7efa697781d" variables = {"id": user_id, "first": 24} return self._pagination(query_hash, variables) def user_tagged(self, user_id): query_hash = "be13233562af2d229b008d2976b998b5" variables = {"id": user_id, "first": 24} return self._pagination(query_hash, variables) def _call(self, query_hash, variables): extr = self.extractor url = "https://www.instagram.com/graphql/query/" params = { "query_hash": query_hash, "variables" : self._json_dumps(variables), } headers = { "Accept" : "*/*", "X-CSRFToken" : extr.csrf_token, "X-Instagram-AJAX": "1006267176", "X-IG-App-ID" : "936619743392459", "X-ASBD-ID" : "198387", "X-IG-WWW-Claim" : extr.www_claim, "X-Requested-With": "XMLHttpRequest", "Referer" : extr.root + "/", } return extr.request_json(url, params=params, headers=headers)["data"] def _pagination(self, query_hash, variables, key_data="user", key_edge=None): extr = self.extractor variables["after"] = extr._init_cursor() while True: data = self._call(query_hash, variables)[key_data] data = data[key_edge] if key_edge else next(iter(data.values())) for edge in data["edges"]: yield edge["node"] info = data["page_info"] if not info["has_next_page"]: return extr._update_cursor(None) elif not data["edges"]: user = self.extractor.item s = "" if user.endswith("s") else "s" raise exception.AbortExtraction( f"{user}'{s} posts are private") variables["after"] = extr._update_cursor(info["end_cursor"]) @cache(maxage=90*86400, keyarg=1) def _login_impl(extr, username, password): extr.log.error("Login with username & password is no longer supported. " "Use browser cookies instead.") return {} def id_from_shortcode(shortcode): return util.bdecode(shortcode, _ALPHABET) def shortcode_from_id(post_id): return util.bencode(int(post_id), _ALPHABET) _ALPHABET = ("ABCDEFGHIJKLMNOPQRSTUVWXYZ" "abcdefghijklmnopqrstuvwxyz" "0123456789-_") ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/issuu.py�����������������������������������������������������0000644�0001750�0001750�00000005310�15040344700�020256� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://issuu.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util class IssuuBase(): """Base class for issuu extractors""" category = "issuu" root = "https://issuu.com" class IssuuPublicationExtractor(IssuuBase, GalleryExtractor): """Extractor for a single publication""" subcategory = "publication" directory_fmt = ("{category}", "{document[username]}", "{document[date]:%Y-%m-%d} {document[title]}") filename_fmt = "{num:>03}.{extension}" archive_fmt = "{document[publicationId]}_{num}" pattern = r"(?:https?://)?issuu\.com(/[^/?#]+/docs/[^/?#]+)" example = "https://issuu.com/issuu/docs/TITLE/" def metadata(self, page): data = text.extr( page, '{\\"documentTextVersion\\":', ']\\n"])</script>') data = util.json_loads(text.unescape( '{"":' + data.replace('\\"', '"'))) doc = data["initialDocumentData"]["document"] doc["date"] = text.parse_datetime( doc["originalPublishDateInISOString"], "%Y-%m-%dT%H:%M:%S.%fZ") self.count = text.parse_int(doc["pageCount"]) self.base = (f"https://image.isu.pub/{doc['revisionId']}-" f"{doc['publicationId']}/jpg/page_") return {"document": doc} def images(self, page): return [(f"{self.base}{i}.jpg", None) for i in range(1, self.count + 1)] class IssuuUserExtractor(IssuuBase, Extractor): """Extractor for all publications of a user/publisher""" subcategory = "user" pattern = r"(?:https?://)?issuu\.com/([^/?#]+)(?:/(\d*))?$" example = "https://issuu.com/USER" def items(self): user, pnum = self.groups base = self.root + "/" + user pnum = text.parse_int(pnum, 1) while True: url = base + "/" + str(pnum) if pnum > 1 else base try: html = self.request(url).text data = text.extr(html, '\\"docs\\":', '}]\\n"]') docs = util.json_loads(data.replace('\\"', '"')) except Exception as exc: self.log.debug("", exc_info=exc) return for publication in docs: url = self.root + "/" + publication["uri"] publication["_extractor"] = IssuuPublicationExtractor yield Message.Queue, url, publication if len(docs) < 48: return pnum += 1 ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753637634.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/itaku.py�����������������������������������������������������0000644�0001750�0001750�00000024707�15041461402�020236� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://itaku.ee/""" from .common import Extractor, Message, Dispatch from ..cache import memcache from .. import text, util BASE_PATTERN = r"(?:https?://)?itaku\.ee" USER_PATTERN = BASE_PATTERN + r"/profile/([^/?#]+)" class ItakuExtractor(Extractor): """Base class for itaku extractors""" category = "itaku" root = "https://itaku.ee" directory_fmt = ("{category}", "{owner_username}") filename_fmt = ("{id}{title:? //}.{extension}") archive_fmt = "{id}" request_interval = (0.5, 1.5) def _init(self): self.api = ItakuAPI(self) self.videos = self.config("videos", True) def items(self): if images := self.images(): for image in images: image["date"] = text.parse_datetime( image["date_added"], "%Y-%m-%dT%H:%M:%S.%fZ") for category, tags in image.pop("categorized_tags").items(): image[f"tags_{category.lower()}"] = [ t["name"] for t in tags] image["tags"] = [t["name"] for t in image["tags"]] sections = [] for s in image["sections"]: if group := s["group"]: sections.append(f"{group['title']}/{s['title']}") else: sections.append(s["title"]) image["sections"] = sections if self.videos and image["video"]: url = image["video"]["video"] else: url = image["image"] yield Message.Directory, image yield Message.Url, url, text.nameext_from_url(url, image) return if posts := self.posts(): for post in posts: images = post.pop("gallery_images") or () post["count"] = len(images) post["date"] = text.parse_datetime( post["date_added"], "%Y-%m-%dT%H:%M:%S.%fZ") post["tags"] = [t["name"] for t in post["tags"]] yield Message.Directory, post for post["num"], image in enumerate(images, 1): post["file"] = image image["date"] = text.parse_datetime( image["date_added"], "%Y-%m-%dT%H:%M:%S.%fZ") url = image["image"] yield Message.Url, url, text.nameext_from_url(url, post) return if users := self.users(): base = f"{self.root}/profile/" for user in users: url = f"{base}{user['owner_username']}" user["_extractor"] = ItakuUserExtractor yield Message.Queue, url, user return images = posts = users = util.noop class ItakuGalleryExtractor(ItakuExtractor): """Extractor for an itaku user's gallery""" subcategory = "gallery" pattern = USER_PATTERN + r"/gallery(?:/(\d+))?" example = "https://itaku.ee/profile/USER/gallery" def images(self): user, section = self.groups return self.api.galleries_images({ "owner" : self.api.user_id(user), "sections": section, }) class ItakuPostsExtractor(ItakuExtractor): """Extractor for an itaku user's posts""" subcategory = "posts" directory_fmt = ("{category}", "{owner_username}", "Posts", "{id}{title:? //}") filename_fmt = "{file[id]}{file[title]:? //}.{extension}" archive_fmt = "{id}_{file[id]}" pattern = USER_PATTERN + r"/posts(?:/(\d+))?" example = "https://itaku.ee/profile/USER/posts" def posts(self): user, folder = self.groups return self.api.posts({ "owner" : self.api.user_id(user), "folders": folder, }) class ItakuStarsExtractor(ItakuExtractor): """Extractor for an itaku user's starred images""" subcategory = "stars" pattern = USER_PATTERN + r"/stars(?:/(\d+))?" example = "https://itaku.ee/profile/USER/stars" def images(self): user, section = self.groups return self.api.galleries_images({ "stars_of": self.api.user_id(user), "sections": section, "ordering": "-like_date", }, "/user_starred_imgs") class ItakuFollowingExtractor(ItakuExtractor): subcategory = "following" pattern = USER_PATTERN + r"/following" example = "https://itaku.ee/profile/USER/following" def users(self): return self.api.user_profiles({ "followed_by": self.api.user_id(self.groups[0]), }) class ItakuFollowersExtractor(ItakuExtractor): subcategory = "followers" pattern = USER_PATTERN + r"/followers" example = "https://itaku.ee/profile/USER/followers" def users(self): return self.api.user_profiles({ "followers_of": self.api.user_id(self.groups[0]), }) class ItakuBookmarksExtractor(ItakuExtractor): """Extractor for an itaku bookmarks folder""" subcategory = "bookmarks" pattern = USER_PATTERN + r"/bookmarks/(image|user)/(\d+)" example = "https://itaku.ee/profile/USER/bookmarks/image/12345" def _init(self): if self.groups[1] == "user": self.images = util.noop ItakuExtractor._init(self) def images(self): return self.api.galleries_images({ "bookmark_folder": self.groups[2], }) def users(self): return self.api.user_profiles({ "bookmark_folder": self.groups[2], }) class ItakuUserExtractor(Dispatch, ItakuExtractor): """Extractor for itaku user profiles""" pattern = USER_PATTERN + r"/?(?:$|\?|#)" example = "https://itaku.ee/profile/USER" def items(self): base = f"{self.root}/profile/{self.groups[0]}/" return self._dispatch_extractors(( (ItakuGalleryExtractor , base + "gallery"), (ItakuPostsExtractor , base + "posts"), (ItakuFollowersExtractor, base + "followers"), (ItakuFollowingExtractor, base + "following"), (ItakuStarsExtractor , base + "stars"), ), ("gallery",)) class ItakuImageExtractor(ItakuExtractor): subcategory = "image" pattern = BASE_PATTERN + r"/images/(\d+)" example = "https://itaku.ee/images/12345" def images(self): return (self.api.image(self.groups[0]),) class ItakuPostExtractor(ItakuExtractor): subcategory = "post" directory_fmt = ("{category}", "{owner_username}", "Posts", "{id}{title:? //}") filename_fmt = "{file[id]}{file[title]:? //}.{extension}" archive_fmt = "{id}_{file[id]}" pattern = BASE_PATTERN + r"/posts/(\d+)" example = "https://itaku.ee/posts/12345" def posts(self): return (self.api.post(self.groups[0]),) class ItakuSearchExtractor(ItakuExtractor): subcategory = "search" pattern = BASE_PATTERN + r"/home/images/?\?([^#]+)" example = "https://itaku.ee/home/images?tags=SEARCH" def images(self): required_tags = [] negative_tags = [] optional_tags = [] params = text.parse_query_list( self.groups[0], {"tags", "maturity_rating"}) if tags := params.pop("tags", None): for tag in tags: if not tag: pass elif tag[0] == "-": negative_tags.append(tag[1:]) elif tag[0] == "~": optional_tags.append(tag[1:]) else: required_tags.append(tag) return self.api.galleries_images({ "required_tags": required_tags, "negative_tags": negative_tags, "optional_tags": optional_tags, }) class ItakuAPI(): def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api" self.headers = { "Accept": "application/json, text/plain, */*", } def galleries_images(self, params, path=""): endpoint = f"/galleries/images{path}/" params = { "cursor" : None, "date_range": "", "maturity_rating": ("SFW", "Questionable", "NSFW"), "ordering" : "-date_added", "page" : "1", "page_size" : "30", "visibility": ("PUBLIC", "PROFILE_ONLY"), **params, } return self._pagination(endpoint, params, self.image) def posts(self, params): endpoint = "/posts/" params = { "cursor" : None, "date_range": "", "maturity_rating": ("SFW", "Questionable", "NSFW"), "ordering" : "-date_added", "page" : "1", "page_size" : "30", **params, } return self._pagination(endpoint, params) def user_profiles(self, params): endpoint = "/user_profiles/" params = { "cursor" : None, "ordering" : "-date_added", "page" : "1", "page_size": "50", "sfw_only" : "false", **params, } return self._pagination(endpoint, params) def image(self, image_id): endpoint = f"/galleries/images/{image_id}/" return self._call(endpoint) def post(self, post_id): endpoint = f"/posts/{post_id}/" return self._call(endpoint) @memcache(keyarg=1) def user(self, username): return self._call(f"/user_profiles/{username}/") def user_id(self, username): if username.startswith("id:"): return int(username[3:]) return self.user(username)["owner"] def _call(self, endpoint, params=None): if not endpoint.startswith("http"): endpoint = self.root + endpoint return self.extractor.request_json( endpoint, params=params, headers=self.headers) def _pagination(self, endpoint, params, extend=None): data = self._call(endpoint, params) while True: if extend is None: yield from data["results"] else: for result in data["results"]: yield extend(result["id"]) url_next = data["links"].get("next") if not url_next: return data = self._call(url_next) ���������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/itchio.py����������������������������������������������������0000644�0001750�0001750�00000003747�15040344700�020401� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://itch.io/""" from .common import Extractor, Message from .. import text class ItchioGameExtractor(Extractor): """Extractor for itch.io games""" category = "itchio" subcategory = "game" root = "https://itch.io" directory_fmt = ("{category}", "{user[name]}") filename_fmt = "{game[title]} ({id}).{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?(\w+)\.itch\.io/([\w-]+)" example = "https://USER.itch.io/GAME" def __init__(self, match): self.user, self.slug = match.groups() Extractor.__init__(self, match) def items(self): game_url = f"https://{self.user}.itch.io/{self.slug}" page = self.request(game_url).text params = { "source": "view_game", "as_props": "1", "after_download_lightbox": "true", } headers = { "Referer": game_url, "X-Requested-With": "XMLHttpRequest", "Origin": f"https://{self.user}.itch.io", } data = { "csrf_token": text.unquote(self.cookies["itchio_token"]), } for upload_id in text.extract_iter(page, 'data-upload_id="', '"'): file_url = f"{game_url}/file/{upload_id}" info = self.request_json(file_url, method="POST", params=params, headers=headers, data=data) game = info["lightbox"]["game"] user = info["lightbox"]["user"] game["url"] = game_url user.pop("follow_button", None) game = {"game": game, "user": user, "id": upload_id} url = info["url"] yield Message.Directory, game yield Message.Url, url, text.nameext_from_url(url, game) �������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/iwara.py�����������������������������������������������������0000644�0001750�0001750�00000036375�15040344700�020230� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.iwara.tv/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import cache, memcache import hashlib BASE_PATTERN = r"(?:https?://)?(?:www\.)?iwara\.tv" USER_PATTERN = rf"{BASE_PATTERN}/profile/([^/?#]+)" class IwaraExtractor(Extractor): """Base class for iwara.tv extractors""" category = "iwara" root = "https://www.iwara.tv" directory_fmt = ("{category}", "{user[name]}") filename_fmt = "{date} {id} {title[:200]} {filename}.{extension}" archive_fmt = "{type} {user[name]} {id} {file_id}" def _init(self): self.api = IwaraAPI(self) def items_image(self, images, user=None): for image in images: try: if "image" in image: # could extract 'date_favorited' here image = image["image"] if not (files := image.get("files")): image = self.api.image(image["id"]) files = image["files"] group_info = self.extract_media_info(image, "file", False) group_info["user"] = (self.extract_user_info(image) if user is None else user) except Exception as exc: self.status |= 1 self.log.error("Failed to process image %s (%s: %s)", image["id"], exc.__class__.__name__, exc) continue group_info["count"] = len(files) yield Message.Directory, group_info for num, file in enumerate(files, 1): file_info = self.extract_media_info(file, None) file_id = file_info["file_id"] url = (f"https://i.iwara.tv/image/original/" f"{file_id}/{file_id}.{file_info['extension']}") yield Message.Url, url, {**file_info, **group_info, "num": num} def items_video(self, videos, user=None): for video in videos: try: if "video" in video: video = video["video"] if "fileUrl" not in video: video = self.api.video(video["id"]) file_url = video["fileUrl"] sources = self.api.source(file_url) source = next((s for s in sources if s.get("name") == "Source"), None) download_url = source.get('src', {}).get('download') info = self.extract_media_info(video, "file") info["count"] = info["num"] = 1 info["user"] = (self.extract_user_info(video) if user is None else user) except Exception as exc: self.status |= 1 self.log.error("Failed to process video %s (%s: %s)", video["id"], exc.__class__.__name__, exc) continue yield Message.Directory, info yield Message.Url, f"https:{download_url}", info def items_user(self, users, key=None): base = f"{self.root}/profile/" for user in users: if key is not None: user = user[key] if (username := user["username"]) is None: continue user["type"] = "user" user["_extractor"] = IwaraUserExtractor yield Message.Queue, f"{base}{username}", user def items_by_type(self, type, results): if type == "image": return self.items_image(results) if type == "video": return self.items_video(results) if type == "user": return self.items_user(results) raise exception.AbortExtraction(f"Unsupported result type '{type}'") def extract_media_info(self, item, key, include_file_info=True): title = t.strip() if (t := item.get("title")) else "" if include_file_info: file_info = item if key is None else item.get(key) or {} filename, _, extension = file_info.get("name", "").rpartition(".") return { "id" : item["id"], "file_id" : file_info.get("id"), "title" : title, "filename" : filename, "extension": extension, "date" : text.parse_datetime( file_info.get("createdAt"), "%Y-%m-%dT%H:%M:%S.%fZ"), "date_updated": text.parse_datetime( file_info.get("updatedAt"), "%Y-%m-%dT%H:%M:%S.%fZ"), "mime" : file_info.get("mime"), "size" : file_info.get("size"), "width" : file_info.get("width"), "height" : file_info.get("height"), "duration" : file_info.get("duration"), "type" : file_info.get("type"), } else: return { "id" : item["id"], "title": title, } def extract_user_info(self, profile): user = profile.get("user") or {} return { "id" : user.get("id"), "name" : user.get("username"), "nick" : user.get("name").strip(), "status" : user.get("status"), "role" : user.get("role"), "premium": user.get("premium"), "date" : text.parse_datetime( user.get("createdAt"), "%Y-%m-%dT%H:%M:%S.000Z"), "description": profile.get("body"), } def _user_params(self): user, qs = self.groups params = text.parse_query(qs) profile = self.api.profile(user) params["user"] = profile["user"]["id"] return self.extract_user_info(profile), params class IwaraUserExtractor(Dispatch, IwaraExtractor): """Extractor for iwara.tv profile pages""" pattern = rf"{USER_PATTERN}/?$" example = "https://www.iwara.tv/profile/USERNAME" def items(self): base = f"{self.root}/profile/{self.groups[0]}/" return self._dispatch_extractors(( (IwaraUserImagesExtractor , f"{base}images"), (IwaraUserVideosExtractor , f"{base}videos"), (IwaraUserPlaylistsExtractor, f"{base}playlists"), ), ("user-images", "user-videos")) class IwaraUserImagesExtractor(IwaraExtractor): subcategory = "user-images" pattern = rf"{USER_PATTERN}/images(?:\?([^#]+))?" example = "https://www.iwara.tv/profile/USERNAME/images" def items(self): user, params = self._user_params() return self.items_image(self.api.images(params), user) class IwaraUserVideosExtractor(IwaraExtractor): subcategory = "user-videos" pattern = rf"{USER_PATTERN}/videos(?:\?([^#]+))?" example = "https://www.iwara.tv/profile/USERNAME/videos" def items(self): user, params = self._user_params() return self.items_video(self.api.videos(params), user) class IwaraUserPlaylistsExtractor(IwaraExtractor): subcategory = "user-playlists" pattern = rf"{USER_PATTERN}/playlists(?:\?([^#]+))?" example = "https://www.iwara.tv/profile/USERNAME/playlists" def items(self): base = f"{self.root}/playlist/" for playlist in self.api.playlists(self._user_params()[1]): playlist["type"] = "playlist" playlist["_extractor"] = IwaraPlaylistExtractor url = f"{base}{playlist['id']}" yield Message.Queue, url, playlist class IwaraFollowingExtractor(IwaraExtractor): subcategory = "following" pattern = rf"{USER_PATTERN}/following" example = "https://www.iwara.tv/profile/USERNAME/following" def items(self): uid = self.api.profile(self.groups[0])["user"]["id"] return self.items_user(self.api.user_following(uid), "user") class IwaraFollowersExtractor(IwaraExtractor): subcategory = "followers" pattern = rf"{USER_PATTERN}/followers" example = "https://www.iwara.tv/profile/USERNAME/followers" def items(self): uid = self.api.profile(self.groups[0])["user"]["id"] return self.items_user(self.api.user_followers(uid), "follower") class IwaraImageExtractor(IwaraExtractor): """Extractor for individual iwara.tv image pages""" subcategory = "image" pattern = rf"{BASE_PATTERN}/image/([^/?#]+)" example = "https://www.iwara.tv/image/ID" def items(self): return self.items_image((self.api.image(self.groups[0]),)) class IwaraVideoExtractor(IwaraExtractor): """Extractor for individual iwara.tv videos""" subcategory = "video" pattern = rf"{BASE_PATTERN}/video/([^/?#]+)" example = "https://www.iwara.tv/video/ID" def items(self): return self.items_video((self.api.video(self.groups[0]),)) class IwaraPlaylistExtractor(IwaraExtractor): """Extractor for individual iwara.tv playlist pages""" subcategory = "playlist" pattern = rf"{BASE_PATTERN}/playlist/([^/?#]+)" example = "https://www.iwara.tv/playlist/ID" def items(self): return self.items_video(self.api.playlist(self.groups[0])) class IwaraFavoriteExtractor(IwaraExtractor): subcategory = "favorite" pattern = rf"{BASE_PATTERN}/favorites(?:/(image|video)s)?" example = "https://www.iwara.tv/favorites/videos" def items(self): type = self.groups[0] or "vidoo" return self.items_by_type(type, self.api.favorites(type)) class IwaraSearchExtractor(IwaraExtractor): """Extractor for iwara.tv search pages""" subcategory = "search" pattern = rf"{BASE_PATTERN}/search\?([^#]+)" example = "https://www.iwara.tv/search?query=QUERY&type=TYPE" def items(self): params = text.parse_query(self.groups[0]) type = params.get("type") self.kwdict["search_tags"] = query = params.get("query") return self.items_by_type(type, self.api.search(type, query)) class IwaraTagExtractor(IwaraExtractor): """Extractor for iwara.tv tag search""" subcategory = "tag" pattern = rf"{BASE_PATTERN}/(image|video)s(?:\?([^#]+))?" example = "https://www.iwara.tv/videos?tags=TAGS" def items(self): type, qs = self.groups params = text.parse_query(qs) self.kwdict["search_tags"] = params.get("tags") return self.items_by_type(type, self.api.media(type, params)) class IwaraAPI(): """Interface for the Iwara API""" root = "https://api.iwara.tv" def __init__(self, extractor): self.extractor = extractor self.headers = { "Referer" : f"{extractor.root}/", "Content-Type": "application/json", "Origin" : extractor.root, } self.username, self.password = extractor._get_auth_info() if not self.username: self.authenticate = util.noop def image(self, image_id): endpoint = f"/image/{image_id}" return self._call(endpoint) def video(self, video_id): endpoint = f"/video/{video_id}" return self._call(endpoint) def playlist(self, playlist_id): endpoint = f"/playlist/{playlist_id}" return self._pagination(endpoint) def detail(self, media): endpoint = f"/{media['type']}/{media['id']}" return self._call(endpoint) def images(self, params): endpoint = "/images" params.setdefault("rating", "all") return self._pagination(endpoint, params) def videos(self, params): endpoint = "/videos" params.setdefault("rating", "all") return self._pagination(endpoint, params) def playlists(self, params): endpoint = "/playlists" return self._pagination(endpoint, params) def media(self, type, params): endpoint = f"/{type}s" params.setdefault("rating", "all") return self._pagination(endpoint, params) def favorites(self, type): if not self.username: raise exception.AuthRequired("'username' & 'password'") endpoint = f"/favorites/{type}s" return self._pagination(endpoint) def search(self, type, query): endpoint = "/search" params = {"type": type, "query": query} return self._pagination(endpoint, params) @memcache(keyarg=1) def profile(self, username): endpoint = f"/profile/{username}" return self._call(endpoint) def user_following(self, user_id): endpoint = f"/user/{user_id}/following" return self._pagination(endpoint) def user_followers(self, user_id): endpoint = f"/user/{user_id}/followers" return self._pagination(endpoint) def source(self, file_url): base, _, query = file_url.partition("?") if not (expires := text.extr(query, "expires=", "&")): return () file_id = base.rpartition("/")[2] sha_postfix = "5nFp9kmbNnHdAFhaqMvt" sha_key = f"{file_id}_{expires}_{sha_postfix}" hash = hashlib.sha1(sha_key.encode()).hexdigest() headers = {"X-Version": hash, **self.headers} return self.extractor.request_json(file_url, headers=headers) def authenticate(self): self.headers["Authorization"] = self._authenticate_impl(self.username) @cache(maxage=3600, keyarg=1) def _authenticate_impl(self, username): refresh_token = _refresh_token_cache(username) if refresh_token is None: self.extractor.log.info("Logging in as %s", username) url = f"{self.root}/user/login" json = { "email" : username, "password": self.password } data = self.extractor.request_json( url, method="POST", headers=self.headers, json=json, fatal=False) if not (refresh_token := data.get("token")): self.extractor.log.debug(data) raise exception.AuthenticationError(data.get("message")) _refresh_token_cache.update(username, refresh_token) self.extractor.log.info("Refreshing access token for %s", username) url = f"{self.root}/user/token" headers = {"Authorization": f"Bearer {refresh_token}", **self.headers} data = self.extractor.request_json( url, method="POST", headers=headers, fatal=False) if not (access_token := data.get("accessToken")): self.extractor.log.debug(data) raise exception.AuthenticationError(data.get("message")) return f"Bearer {access_token}" def _call(self, endpoint, params=None, headers=None): if headers is None: headers = self.headers url = self.root + endpoint self.authenticate() return self.extractor.request_json(url, params=params, headers=headers) def _pagination(self, endpoint, params=None): if params is None: params = {} params["page"] = 0 params["limit"] = 50 while True: data = self._call(endpoint, params) if not (results := data.get("results")): break yield from results if len(results) < params["limit"]: break params["page"] += 1 @cache(maxage=28*86400, keyarg=0) def _refresh_token_cache(username): return None �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/jschan.py����������������������������������������������������0000644�0001750�0001750�00000004527�15040344700�020365� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for jschan Imageboards""" from .common import BaseExtractor, Message from .. import text import itertools class JschanExtractor(BaseExtractor): basecategory = "jschan" BASE_PATTERN = JschanExtractor.update({ "94chan": { "root": "https://94chan.org", "pattern": r"94chan\.org" } }) class JschanThreadExtractor(JschanExtractor): """Extractor for jschan threads""" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{threadId} {subject|nomarkup[:50]}") filename_fmt = "{postId}{num:?-//} {filename}.{extension}" archive_fmt = "{board}_{postId}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/thread/(\d+)\.html" example = "https://94chan.org/a/thread/12345.html" def items(self): url = f"{self.root}/{self.groups[-2]}/thread/{self.groups[-1]}.json" thread = self.request_json(url) thread["threadId"] = thread["postId"] posts = thread.pop("replies", ()) yield Message.Directory, thread for post in itertools.chain((thread,), posts): if files := post.pop("files", ()): thread.update(post) thread["count"] = len(files) for num, file in enumerate(files): url = f"{self.root}/file/{file['filename']}" file.update(thread) file["num"] = num file["siteFilename"] = file["filename"] text.nameext_from_url(file["originalFilename"], file) yield Message.Url, url, file class JschanBoardExtractor(JschanExtractor): """Extractor for jschan boards""" subcategory = "board" pattern = (BASE_PATTERN + r"/([^/?#]+)" r"(?:/index\.html|/catalog\.html|/\d+\.html|/?$)") example = "https://94chan.org/a/" def items(self): board = self.groups[-1] url = f"{self.root}/{board}/catalog.json" for thread in self.request_json(url): url = f"{self.root}/{board}/thread/{thread['postId']}.html" thread["_extractor"] = JschanThreadExtractor yield Message.Queue, url, thread �������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/kabeuchi.py��������������������������������������������������0000644�0001750�0001750�00000005174�15040344700�020671� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2020-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kabe-uchiroom.com/""" from .common import Extractor, Message from .. import text, exception class KabeuchiUserExtractor(Extractor): """Extractor for all posts of a user on kabe-uchiroom.com""" category = "kabeuchi" subcategory = "user" directory_fmt = ("{category}", "{twitter_user_id} {twitter_id}") filename_fmt = "{id}_{num:>02}{title:?_//}.{extension}" archive_fmt = "{id}_{num}" root = "https://kabe-uchiroom.com" pattern = r"(?:https?://)?kabe-uchiroom\.com/mypage/?\?id=(\d+)" example = "https://kabe-uchiroom.com/mypage/?id=12345" def items(self): uid = self.groups[0] base = f"{self.root}/accounts/upfile/{uid[-1]}/{uid}/" keys = ("image1", "image2", "image3", "image4", "image5", "image6") for post in self.posts(uid): if post.get("is_ad") or not post["image1"]: continue post["date"] = text.parse_datetime( post["created_at"], "%Y-%m-%d %H:%M:%S") yield Message.Directory, post for key in keys: name = post[key] if not name: break url = base + name post["num"] = ord(key[-1]) - 48 yield Message.Url, url, text.nameext_from_url(name, post) def posts(self, uid): url = f"{self.root}/mypage/?id={uid}" response = self.request(url) if response.history and response.url == self.root + "/": raise exception.NotFoundError("user") target_id = text.extr(response.text, 'user_friend_id = "', '"') return self._pagination(target_id) def _pagination(self, target_id): url = f"{self.root}/get_posts.php" data = { "user_id" : "0", "target_id" : target_id, "type" : "uploads", "sort_type" : "0", "category_id": "all", "latest_post": "", "page_num" : 0, } while True: info = self.request_json(url, method="POST", data=data) datas = info["datas"] if not datas or not isinstance(datas, list): return yield from datas last_id = datas[-1]["id"] if last_id == info["last_data"]: return data["latest_post"] = last_id data["page_num"] += 1 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/keenspot.py��������������������������������������������������0000644�0001750�0001750�00000011001�15040344700�020730� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://www.keenspot.com/""" from .common import Extractor, Message from .. import text class KeenspotComicExtractor(Extractor): """Extractor for webcomics from keenspot.com""" category = "keenspot" subcategory = "comic" directory_fmt = ("{category}", "{comic}") filename_fmt = "{filename}.{extension}" archive_fmt = "{comic}_{filename}" pattern = r"(?:https?://)?(?!www\.|forums\.)([\w-]+)\.keenspot\.com(/.+)?" example = "http://COMIC.keenspot.com/" def __init__(self, match): Extractor.__init__(self, match) self.comic = match[1].lower() self.path = match[2] self.root = "http://" + self.comic + ".keenspot.com" self._needle = "" self._image = 'class="ksc"' self._next = self._next_needle def items(self): data = {"comic": self.comic} yield Message.Directory, data with self.request(self.root + "/") as response: if response.history: url = response.request.url self.root = url[:url.index("/", 8)] page = response.text del response url = self._first(page) if self.path: url = self.root + self.path prev = None ilen = len(self._image) while url and url != prev: prev = url page = self.request(text.urljoin(self.root, url)).text pos = 0 while True: pos = page.find(self._image, pos) if pos < 0: break img, pos = text.extract(page, 'src="', '"', pos + ilen) if img.endswith(".js"): continue if img[0] == "/": img = self.root + img elif "youtube.com/" in img: img = "ytdl:" + img yield Message.Url, img, text.nameext_from_url(img, data) url = self._next(page) def _first(self, page): if self.comic == "brawlinthefamily": self._next = self._next_brawl self._image = '<div id="comic">' return "http://brawlinthefamily.keenspot.com/comic/theshowdown/" if url := text.extr(page, '<link rel="first" href="', '"'): if self.comic == "porcelain": self._needle = 'id="porArchivetop_"' else: self._next = self._next_link return url pos = page.find('id="first_day1"') if pos >= 0: self._next = self._next_id return text.rextr(page, 'href="', '"', pos) pos = page.find('>FIRST PAGE<') if pos >= 0: if self.comic == "lastblood": self._next = self._next_lastblood self._image = '<div id="comic">' else: self._next = self._next_id return text.rextr(page, 'href="', '"', pos) pos = page.find('<div id="kscomicpart"') if pos >= 0: self._needle = '<a href="/archive.html' return text.extract(page, 'href="', '"', pos)[0] pos = page.find('>First Comic<') # twokinds if pos >= 0: self._image = '</header>' self._needle = 'class="navarchive"' return text.rextr(page, 'href="', '"', pos) pos = page.find('id="flip_FirstDay"') # flipside if pos >= 0: self._image = 'class="flip_Pages ksc"' self._needle = 'id="flip_ArcButton"' return text.rextr(page, 'href="', '"', pos) self.log.error("Unrecognized page layout") return None def _next_needle(self, page): pos = page.index(self._needle) + len(self._needle) return text.extract(page, 'href="', '"', pos)[0] def _next_link(self, page): return text.extr(page, '<link rel="next" href="', '"') def _next_id(self, page): pos = page.find('id="next_') return text.rextr(page, 'href="', '"', pos) if pos >= 0 else None def _next_lastblood(self, page): pos = page.index("link rel='next'") return text.extract(page, "href='", "'", pos)[0] def _next_brawl(self, page): pos = page.index("comic-nav-next") url = text.rextr(page, 'href="', '"', pos) return None if "?random" in url else url �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753604140.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/kemono.py����������������������������������������������������0000644�0001750�0001750�00000057665�15041360054�020423� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://kemono.cr/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache import itertools import json BASE_PATTERN = (r"(?:https?://)?(?:www\.|beta\.)?" r"(kemono|coomer)\.(cr|s[tu]|party)") USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)/user/([^/?#]+)" HASH_PATTERN = r"/[0-9a-f]{2}/[0-9a-f]{2}/([0-9a-f]{64})" class KemonoExtractor(Extractor): """Base class for kemono extractors""" category = "kemono" root = "https://kemono.cr" directory_fmt = ("{category}", "{service}", "{user}") filename_fmt = "{id}_{title[:180]}_{num:>02}_{filename[:180]}.{extension}" archive_fmt = "{service}_{user}_{id}_{num}" cookies_domain = ".kemono.cr" def __init__(self, match): if match[1] == "coomer": self.category = "coomer" self.root = "https://coomer.st" self.cookies_domain = ".coomer.st" Extractor.__init__(self, match) def _init(self): self.api = KemonoAPI(self) self.revisions = self.config("revisions") if self.revisions: self.revisions_unique = (self.revisions == "unique") order = self.config("order-revisions") self.revisions_reverse = order[0] in ("r", "a") if order else False self._find_inline = util.re( r'src="(?:https?://(?:kemono\.cr|coomer\.st))?(/inline/[^"]+' r'|/[0-9a-f]{2}/[0-9a-f]{2}/[0-9a-f]{64}\.[^"]+)').findall self._json_dumps = json.JSONEncoder( ensure_ascii=False, check_circular=False, sort_keys=True, separators=(",", ":")).encode def items(self): find_hash = util.re(HASH_PATTERN).match generators = self._build_file_generators(self.config("files")) announcements = True if self.config("announcements") else None archives = True if self.config("archives") else False comments = True if self.config("comments") else False dms = True if self.config("dms") else None max_posts = self.config("max-posts") creator_info = {} if self.config("metadata", True) else None exts_archive = util.EXTS_ARCHIVE if duplicates := self.config("duplicates"): if isinstance(duplicates, str): duplicates = set(duplicates.split(",")) elif isinstance(duplicates, (list, tuple)): duplicates = set(duplicates) else: duplicates = {"file", "attachment", "inline"} else: duplicates = () # prevent files from being sent with gzip compression headers = {"Accept-Encoding": "identity"} posts = self.posts() if max_posts: posts = itertools.islice(posts, max_posts) if self.revisions: posts = self._revisions(posts) for post in posts: headers["Referer"] = (f"{self.root}/{post['service']}/user/" f"{post['user']}/post/{post['id']}") post["_http_headers"] = headers post["date"] = self._parse_datetime( post.get("published") or post.get("added") or "") service = post["service"] creator_id = post["user"] if creator_info is not None: key = f"{service}_{creator_id}" if key not in creator_info: creator = creator_info[key] = self.api.creator_profile( service, creator_id) else: creator = creator_info[key] post["user_profile"] = creator post["username"] = creator["name"] if comments: try: post["comments"] = self.api.creator_post_comments( service, creator_id, post["id"]) except exception.HttpError: post["comments"] = () if dms is not None: if dms is True: dms = self.api.creator_dms( post["service"], post["user"]) try: dms = dms["props"]["dms"] except Exception: dms = () post["dms"] = dms if announcements is not None: if announcements is True: announcements = self.api.creator_announcements( post["service"], post["user"]) post["announcements"] = announcements files = [] hashes = set() post_archives = post["archives"] = [] for file in itertools.chain.from_iterable( g(post) for g in generators): url = file["path"] if "\\" in url: file["path"] = url = url.replace("\\", "/") if match := find_hash(url): file["hash"] = hash = match[1] if file["type"] not in duplicates and hash in hashes: self.log.debug("Skipping %s %s (duplicate)", file["type"], url) continue hashes.add(hash) else: file["hash"] = hash = "" if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] file["url"] = url text.nameext_from_url(file.get("name", url), file) ext = text.ext_from_url(url) if not file["extension"]: file["extension"] = ext elif ext == "txt" and file["extension"] != "txt": file["_http_validate"] = _validate elif ext in exts_archive: file["type"] = "archive" if archives: try: data = self.api.file(hash) data.update(file) post_archives.append(data) except Exception as exc: self.log.warning( "%s: Failed to retrieve archive metadata of " "'%s' (%s: %s)", post["id"], file.get("name"), exc.__class__.__name__, exc) post_archives.append(file.copy()) else: post_archives.append(file.copy()) files.append(file) post["count"] = len(files) yield Message.Directory, post for post["num"], file in enumerate(files, 1): if "id" in file: del file["id"] post.update(file) yield Message.Url, file["url"], post def login(self): username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl( (username, self.cookies_domain), password)) @cache(maxage=3650*86400, keyarg=1) def _login_impl(self, username, password): username = username[0] self.log.info("Logging in as %s", username) url = self.root + "/api/v1/authentication/login" data = {"username": username, "password": password} response = self.request(url, method="POST", json=data, fatal=False) if response.status_code >= 400: try: msg = '"' + response.json()["error"] + '"' except Exception: msg = '"Username or password is incorrect"' raise exception.AuthenticationError(msg) return {c.name: c.value for c in response.cookies} def _file(self, post): file = post["file"] if not file or "path" not in file: return () file["type"] = "file" return (file,) def _attachments(self, post): for attachment in post["attachments"]: attachment["type"] = "attachment" return post["attachments"] def _inline(self, post): for path in self._find_inline(post.get("content") or ""): yield {"path": path, "name": path, "type": "inline"} def _build_file_generators(self, filetypes): if filetypes is None: return (self._attachments, self._file, self._inline) genmap = { "file" : self._file, "attachments": self._attachments, "inline" : self._inline, } if isinstance(filetypes, str): filetypes = filetypes.split(",") return [genmap[ft] for ft in filetypes] def _parse_datetime(self, date_string): if len(date_string) > 19: date_string = date_string[:19] return text.parse_datetime(date_string, "%Y-%m-%dT%H:%M:%S") def _revisions(self, posts): return itertools.chain.from_iterable( self._revisions_post(post) for post in posts) def _revisions_post(self, post): post["revision_id"] = 0 try: revs = self.api.creator_post_revisions( post["service"], post["user"], post["id"]) except exception.HttpError: post["revision_hash"] = self._revision_hash(post) post["revision_index"] = 1 post["revision_count"] = 1 return (post,) revs.insert(0, post) for rev in revs: rev["revision_hash"] = self._revision_hash(rev) if self.revisions_unique: uniq = [] last = None for rev in revs: if last != rev["revision_hash"]: last = rev["revision_hash"] uniq.append(rev) revs = uniq cnt = idx = len(revs) for rev in revs: rev["revision_index"] = idx rev["revision_count"] = cnt idx -= 1 if self.revisions_reverse: revs.reverse() return revs def _revisions_all(self, service, creator_id, post_id): revs = self.api.creator_post_revisions(service, creator_id, post_id) cnt = idx = len(revs) for rev in revs: rev["revision_hash"] = self._revision_hash(rev) rev["revision_index"] = idx rev["revision_count"] = cnt idx -= 1 if self.revisions_reverse: revs.reverse() return revs def _revision_hash(self, revision): rev = revision.copy() rev.pop("revision_id", None) rev.pop("added", None) rev.pop("next", None) rev.pop("prev", None) rev["file"] = rev["file"].copy() rev["file"].pop("name", None) rev["attachments"] = [a.copy() for a in rev["attachments"]] for a in rev["attachments"]: a.pop("name", None) return util.sha1(self._json_dumps(rev)) def _validate(response): return (response.headers["content-length"] != "9" or response.content != b"not found") class KemonoUserExtractor(KemonoExtractor): """Extractor for all posts from a kemono.cr user listing""" subcategory = "user" pattern = USER_PATTERN + r"/?(?:\?([^#]+))?(?:$|\?|#)" example = "https://kemono.cr/SERVICE/user/12345" def __init__(self, match): self.subcategory = match[3] KemonoExtractor.__init__(self, match) def posts(self): _, _, service, creator_id, query = self.groups params = text.parse_query(query) tag = params.get("tag") endpoint = self.config("endpoint") if endpoint == "legacy+": endpoint = self._posts_legacy_plus elif endpoint == "legacy" or tag: endpoint = self.api.creator_posts_legacy else: endpoint = self.api.creator_posts return endpoint(service, creator_id, params.get("o"), params.get("q"), tag) def _posts_legacy_plus(self, service, creator_id, offset=0, query=None, tags=None): for post in self.api.creator_posts_legacy( service, creator_id, offset, query, tags): yield self.api.creator_post( service, creator_id, post["id"])["post"] class KemonoPostsExtractor(KemonoExtractor): """Extractor for kemono.cr post listings""" subcategory = "posts" pattern = BASE_PATTERN + r"/posts()()(?:/?\?([^#]+))?" example = "https://kemono.cr/posts" def posts(self): params = text.parse_query(self.groups[4]) return self.api.posts( params.get("o"), params.get("q"), params.get("tag")) class KemonoPostExtractor(KemonoExtractor): """Extractor for a single kemono.cr post""" subcategory = "post" pattern = USER_PATTERN + r"/post/([^/?#]+)(/revisions?(?:/(\d*))?)?" example = "https://kemono.cr/SERVICE/user/12345/post/12345" def __init__(self, match): self.subcategory = match[3] KemonoExtractor.__init__(self, match) def posts(self): _, _, service, creator_id, post_id, revision, revision_id = self.groups post = self.api.creator_post(service, creator_id, post_id) if not revision: return (post["post"],) self.revisions = False revs = self._revisions_all(service, creator_id, post_id) if not revision_id: return revs for rev in revs: if str(rev["revision_id"]) == revision_id: return (rev,) raise exception.NotFoundError("revision") class KemonoDiscordExtractor(KemonoExtractor): """Extractor for kemono.cr discord servers""" subcategory = "discord" directory_fmt = ("{category}", "discord", "{server_id} {server}", "{channel_id} {channel}") filename_fmt = "{id}_{num:>02}_{filename}.{extension}" archive_fmt = "discord_{server_id}_{id}_{num}" pattern = BASE_PATTERN + r"/discord/server/(\d+)[/#](?:channel/)?(\d+)" example = "https://kemono.cr/discord/server/12345/12345" def items(self): _, _, server_id, channel_id = self.groups try: server, channels = discord_server_info(self, server_id) channel = channels[channel_id] except Exception: raise exception.NotFoundError("channel") data = { "server" : server["name"], "server_id" : server["id"], "channel" : channel["name"], "channel_id" : channel["id"], "channel_nsfw" : channel["is_nsfw"], "channel_type" : channel["type"], "channel_topic": channel["topic"], "parent_id" : channel["parent_channel_id"], } find_inline = util.re( r"https?://(?:cdn\.discordapp.com|media\.discordapp\.net)" r"(/[A-Za-z0-9-._~:/?#\[\]@!$&'()*+,;%=]+)").findall find_hash = util.re(HASH_PATTERN).match posts = self.api.discord_channel(channel_id) if max_posts := self.config("max-posts"): posts = itertools.islice(posts, max_posts) for post in posts: files = [] for attachment in post["attachments"]: match = find_hash(attachment["path"]) attachment["hash"] = match[1] if match else "" attachment["type"] = "attachment" files.append(attachment) for path in find_inline(post["content"] or ""): files.append({"path": "https://cdn.discordapp.com" + path, "name": path, "type": "inline", "hash": ""}) post.update(data) post["date"] = self._parse_datetime(post["published"]) post["count"] = len(files) yield Message.Directory, post for post["num"], file in enumerate(files, 1): post["hash"] = file["hash"] post["type"] = file["type"] url = file["path"] text.nameext_from_url(file.get("name", url), post) if not post["extension"]: post["extension"] = text.ext_from_url(url) if url[0] == "/": url = self.root + "/data" + url elif url.startswith(self.root): url = self.root + "/data" + url[20:] yield Message.Url, url, post class KemonoDiscordServerExtractor(KemonoExtractor): subcategory = "discord-server" pattern = BASE_PATTERN + r"/discord/server/(\d+)$" example = "https://kemono.cr/discord/server/12345" def items(self): server_id = self.groups[2] server, channels = discord_server_info(self, server_id) for channel in channels.values(): url = (f"{self.root}/discord/server/{server_id}/" f"{channel['id']}#{channel['name']}") yield Message.Queue, url, { "server" : server, "channel" : channel, "_extractor": KemonoDiscordExtractor, } @memcache(keyarg=1) def discord_server_info(extr, server_id): server = extr.api.discord_server(server_id) return server, { channel["id"]: channel for channel in server.pop("channels") } class KemonoFavoriteExtractor(KemonoExtractor): """Extractor for kemono.cr favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/(?:account/)?favorites()()(?:/?\?([^#]+))?" example = "https://kemono.cr/account/favorites/artists" def items(self): self.login() params = text.parse_query(self.groups[4]) type = params.get("type") or self.config("favorites") or "artist" sort = params.get("sort") order = params.get("order") or "desc" if type == "artist": users = self.api.account_favorites("artist") if not sort: sort = "updated" users.sort(key=lambda x: x[sort] or util.NONE, reverse=(order == "desc")) for user in users: service = user["service"] if service == "discord": user["_extractor"] = KemonoDiscordServerExtractor url = f"{self.root}/discord/server/{user['id']}" else: user["_extractor"] = KemonoUserExtractor url = f"{self.root}/{service}/user/{user['id']}" yield Message.Queue, url, user elif type == "post": posts = self.api.account_favorites("post") if not sort: sort = "faved_seq" posts.sort(key=lambda x: x[sort] or util.NONE, reverse=(order == "desc")) for post in posts: post["_extractor"] = KemonoPostExtractor url = (f"{self.root}/{post['service']}/user/" f"{post['user']}/post/{post['id']}") yield Message.Queue, url, post class KemonoArtistsExtractor(KemonoExtractor): """Extractor for kemono artists""" subcategory = "artists" pattern = BASE_PATTERN + r"/artists(?:\?([^#]+))?" example = "https://kemono.cr/artists" def items(self): params = text.parse_query(self.groups[2]) users = self.api.creators() if params.get("service"): service = params["service"].lower() users = [user for user in users if user["service"] == service] if params.get("q"): q = params["q"].lower() users = [user for user in users if q in user["name"].lower()] sort = params.get("sort_by") or "favorited" order = params.get("order") or "desc" users.sort(key=lambda user: user[sort] or util.NONE, reverse=(order != "asc")) for user in users: service = user["service"] if service == "discord": user["_extractor"] = KemonoDiscordServerExtractor url = f"{self.root}/discord/server/{user['id']}" else: user["_extractor"] = KemonoUserExtractor url = f"{self.root}/{service}/user/{user['id']}" yield Message.Queue, url, user class KemonoAPI(): """Interface for the Kemono API v1.1.0 https://kemono.cr/documentation/api """ def __init__(self, extractor): self.extractor = extractor self.root = extractor.root + "/api/v1" def posts(self, offset=0, query=None, tags=None): endpoint = "/posts" params = {"q": query, "o": offset, "tag": tags} return self._pagination(endpoint, params, 50, "posts") def file(self, file_hash): endpoint = "/file/" + file_hash return self._call(endpoint) def creators(self): endpoint = "/creators.txt" return self._call(endpoint) def creator_posts(self, service, creator_id, offset=0, query=None, tags=None): endpoint = f"/{service}/user/{creator_id}" params = {"q": query, "tag": tags, "o": offset} return self._pagination(endpoint, params, 50) def creator_posts_legacy(self, service, creator_id, offset=0, query=None, tags=None): endpoint = f"/{service}/user/{creator_id}/posts-legacy" params = {"o": offset, "tag": tags, "q": query} return self._pagination(endpoint, params, 50, "results") def creator_announcements(self, service, creator_id): endpoint = f"/{service}/user/{creator_id}/announcements" return self._call(endpoint) def creator_dms(self, service, creator_id): endpoint = f"/{service}/user/{creator_id}/dms" return self._call(endpoint) def creator_fancards(self, service, creator_id): endpoint = f"/{service}/user/{creator_id}/fancards" return self._call(endpoint) def creator_post(self, service, creator_id, post_id): endpoint = f"/{service}/user/{creator_id}/post/{post_id}" return self._call(endpoint) def creator_post_comments(self, service, creator_id, post_id): endpoint = f"/{service}/user/{creator_id}/post/{post_id}/comments" return self._call(endpoint) def creator_post_revisions(self, service, creator_id, post_id): endpoint = f"/{service}/user/{creator_id}/post/{post_id}/revisions" return self._call(endpoint) def creator_profile(self, service, creator_id): endpoint = f"/{service}/user/{creator_id}/profile" return self._call(endpoint) def creator_links(self, service, creator_id): endpoint = f"/{service}/user/{creator_id}/links" return self._call(endpoint) def creator_tags(self, service, creator_id): endpoint = f"/{service}/user/{creator_id}/tags" return self._call(endpoint) def discord_channel(self, channel_id): endpoint = f"/discord/channel/{channel_id}" return self._pagination(endpoint, {}, 150) def discord_channel_lookup(self, server_id): endpoint = f"/discord/channel/lookup/{server_id}" return self._call(endpoint) def discord_server(self, server_id): endpoint = f"/discord/server/{server_id}" return self._call(endpoint) def account_favorites(self, type): endpoint = "/account/favorites" params = {"type": type} return self._call(endpoint, params) def _call(self, endpoint, params=None): url = self.root + endpoint response = self.extractor.request(url, params=params) return response.json() def _pagination(self, endpoint, params, batch=50, key=False): offset = text.parse_int(params.get("o")) params["o"] = offset - offset % batch while True: data = self._call(endpoint, params) if key: data = data.get(key) if not data: return yield from data if len(data) < batch: return params["o"] += batch ���������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/khinsider.py�������������������������������������������������0000644�0001750�0001750�00000007362�15040344700�021077� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://downloads.khinsider.com/""" from .common import Extractor, Message, AsynchronousMixin from .. import text, exception class KhinsiderSoundtrackExtractor(AsynchronousMixin, Extractor): """Extractor for soundtracks from khinsider.com""" category = "khinsider" subcategory = "soundtrack" root = "https://downloads.khinsider.com" directory_fmt = ("{category}", "{album[name]}") archive_fmt = "{filename}.{extension}" pattern = (r"(?:https?://)?downloads\.khinsider\.com" r"/game-soundtracks/album/([^/?#]+)") example = ("https://downloads.khinsider.com" "/game-soundtracks/album/TITLE") def __init__(self, match): Extractor.__init__(self, match) self.album = match[1] def items(self): url = self.root + "/game-soundtracks/album/" + self.album page = self.request(url, encoding="utf-8").text if "Download all songs at once:" not in page: raise exception.NotFoundError("soundtrack") data = self.metadata(page) yield Message.Directory, data if self.config("covers", False): for num, url in enumerate(self._extract_covers(page), 1): cover = text.nameext_from_url( url, {"url": url, "num": num, "type": "cover"}) cover.update(data) yield Message.Url, url, cover for track in self._extract_tracks(page): track.update(data) track["type"] = "track" yield Message.Url, track["url"], track def metadata(self, page): extr = text.extract_from(page) return {"album": { "name" : text.unescape(extr("<h2>", "<")), "platform": text.split_html(extr("Platforms: ", "<br>"))[::2], "year": extr("Year: <b>", "<"), "catalog": extr("Catalog Number: <b>", "<"), "developer": text.remove_html(extr(" Developed by: ", "</")), "publisher": text.remove_html(extr(" Published by: ", "</")), "count": text.parse_int(extr("Number of Files: <b>", "<")), "size" : text.parse_bytes(extr("Total Filesize: <b>", "<")[:-1]), "date" : extr("Date Added: <b>", "<"), "type" : text.remove_html(extr("Album type: <b>", "</b>")), "uploader": text.remove_html(extr("Uploaded by: ", "</")), }} def _extract_tracks(self, page): fmt = self.config("format", ("mp3",)) if fmt and isinstance(fmt, str): if fmt == "all": fmt = None else: fmt = fmt.lower().split(",") page = text.extr(page, '<table id="songlist">', '</table>') for num, url in enumerate(text.extract_iter( page, '<td class="clickable-row"><a href="', '"'), 1): url = text.urljoin(self.root, url) page = self.request(url, encoding="utf-8").text track = first = None for url in text.extract_iter(page, '<p><a href="', '"'): track = text.nameext_from_url(url, {"num": num, "url": url}) if first is None: first = track if not fmt or track["extension"] in fmt: first = False yield track if first: yield first def _extract_covers(self, page): return [ text.unescape(text.extr(cover, ' href="', '"')) for cover in text.extract_iter(page, ' class="albumImage', '</') ] ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/komikcast.py�������������������������������������������������0000644�0001750�0001750�00000006502�15040344700�021077� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://komikcast.li/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util BASE_PATTERN = (r"(?:https?://)?(?:www\.)?" r"komikcast\d*\.(?:l(?:i|a|ol)|com|cz|site|mo?e)") class KomikcastBase(): """Base class for komikcast extractors""" category = "komikcast" root = "https://komikcast.li" def parse_chapter_string(self, chapter_string, data=None): """Parse 'chapter_string' value and add its info to 'data'""" if data is None: data = {} pattern = util.re(r"(?:(.*) Chapter )?0*(\d+)([^ ]*)(?: (?:- )?(.+))?") match = pattern.match(text.unescape(chapter_string)) manga, chapter, data["chapter_minor"], title = match.groups() if manga: data["manga"] = manga.partition(" Chapter ")[0] if title and not title.lower().startswith("bahasa indonesia"): data["title"] = title.strip() else: data["title"] = "" data["chapter"] = text.parse_int(chapter) data["lang"] = "id" data["language"] = "Indonesian" return data class KomikcastChapterExtractor(KomikcastBase, ChapterExtractor): """Extractor for komikcast manga chapters""" pattern = BASE_PATTERN + r"(/chapter/[^/?#]+/)" example = "https://komikcast.li/chapter/TITLE/" def metadata(self, page): info = text.extr(page, "<title>", " - Komikcast<") return self.parse_chapter_string(info) def images(self, page): readerarea = text.extr( page, '<div class="main-reading-area', '</div') pattern = util.re(r"<img[^>]* src=[\"']([^\"']+)") return [ (text.unescape(url), None) for url in pattern.findall(readerarea) ] class KomikcastMangaExtractor(KomikcastBase, MangaExtractor): """Extractor for komikcast manga""" chapterclass = KomikcastChapterExtractor pattern = BASE_PATTERN + r"(/(?:komik/)?[^/?#]+/?)$" example = "https://komikcast.li/komik/TITLE" def chapters(self, page): results = [] data = self.metadata(page) for item in text.extract_iter( page, '<a class="chapter-link-item" href="', '</a'): url, _, chapter = item.rpartition('">Chapter') chapter, sep, minor = chapter.strip().partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor results.append((url, data.copy())) return results def metadata(self, page): """Return a dict with general metadata""" manga , pos = text.extract(page, "<title>" , " - Komikcast<") genres, pos = text.extract( page, 'class="komik_info-content-genre">', "</span>", pos) author, pos = text.extract(page, ">Author:", "</span>", pos) mtype , pos = text.extract(page, ">Type:" , "</span>", pos) return { "manga": text.unescape(manga), "genres": text.split_html(genres), "author": text.remove_html(author), "type": text.remove_html(mtype), } ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/leakgallery.py�����������������������������������������������0000644�0001750�0001750�00000011463�15040344700�021410� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://leakgallery.com""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?leakgallery\.com" class LeakgalleryExtractor(Extractor): category = "leakgallery" directory_fmt = ("{category}", "{creator}") filename_fmt = "{id}_{filename}.{extension}" archive_fmt = "{creator}_{id}" def _yield_media_items(self, medias, creator=None): seen = set() for media in medias: path = media["file_path"] if path in seen: continue seen.add(path) if creator is None: try: media["creator"] = \ media["profile"]["username"] or "unknown" except Exception: media["creator"] = "unknown" else: media["creator"] = creator media["url"] = url = f"https://cdn.leakgallery.com/{path}" text.nameext_from_url(url, media) yield Message.Directory, media yield Message.Url, url, media def _pagination(self, type, base, params=None, creator=None, pnum=1): while True: try: data = self.request_json(f"{base}{pnum}", params=params) if not data: return if "medias" in data: data = data["medias"] if not data or not isinstance(data, list): return yield from self._yield_media_items(data, creator) pnum += 1 except Exception as exc: self.log.error("Failed to retrieve %s page %s: %s", type, pnum, exc) return class LeakgalleryUserExtractor(LeakgalleryExtractor): """Extractor for profile posts on leakgallery.com""" subcategory = "user" pattern = ( BASE_PATTERN + r"/(?!trending-medias|most-liked|random/medias)([^/?#]+)" r"(?:/(Photos|Videos|All))?" r"(?:/(MostRecent|MostViewed|MostLiked))?/?$" ) example = "https://leakgallery.com/creator" def items(self): creator, mtype, msort = self.groups base = f"https://api.leakgallery.com/profile/{creator}/" params = {"type": mtype or "All", "sort": msort or "MostRecent"} return self._pagination(creator, base, params, creator) class LeakgalleryTrendingExtractor(LeakgalleryExtractor): """Extractor for trending posts on leakgallery.com""" subcategory = "trending" pattern = BASE_PATTERN + r"/trending-medias(?:/([\w-]+))?" example = "https://leakgallery.com/trending-medias/Week" def items(self): period = self.groups[0] or "Last-Hour" base = f"https://api.leakgallery.com/popular/media/{period}/" return self._pagination("trending", base) class LeakgalleryMostlikedExtractor(LeakgalleryExtractor): """Extractor for most liked posts on leakgallery.com""" subcategory = "mostliked" pattern = BASE_PATTERN + r"/most-liked" example = "https://leakgallery.com/most-liked" def items(self): base = "https://api.leakgallery.com/most-liked/" return self._pagination("most-liked", base) class LeakgalleryPostExtractor(LeakgalleryExtractor): """Extractor for individual posts on leakgallery.com""" subcategory = "post" pattern = BASE_PATTERN + r"/([^/?#]+)/(\d+)" example = "https://leakgallery.com/CREATOR/12345" def items(self): creator, post_id = self.groups url = f"https://leakgallery.com/{creator}/{post_id}" try: page = self.request(url).text video_urls = text.re( r"https://cdn\.leakgallery\.com/content[^/?#]*/" r"(?:compressed_)?watermark_[^\"]+\." r"(?:mp4|mov|m4a|webm)" ).findall(page) image_urls = text.re( r"https://cdn\.leakgallery\.com/content[^/?#]*/" r"watermark_[^\"]+\.(?:jpe?g|png)" ).findall(page) seen = set() for url in video_urls + image_urls: if url in seen: continue seen.add(url) data = { "id": post_id, "creator": creator, "url": url, } text.nameext_from_url(url, data) yield Message.Directory, data yield Message.Url, url, data except Exception as exc: self.log.error("Failed to extract post page %s/%s: %s", creator, post_id, exc) �������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/lensdump.py��������������������������������������������������0000644�0001750�0001750�00000010554�15040344700�020743� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lensdump.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?lensdump\.com" class LensdumpBase(): """Base class for lensdump extractors""" category = "lensdump" root = "https://lensdump.com" def _pagination(self, page, begin, end): while True: yield from text.extract_iter(page, begin, end) next = text.extr(page, ' data-pagination="next"', '>') if not next: return url = text.urljoin(self.root, text.extr(next, 'href="', '"')) page = self.request(url).text class LensdumpAlbumExtractor(LensdumpBase, GalleryExtractor): subcategory = "album" pattern = BASE_PATTERN + r"/a/(\w+)(?:/?\?([^#]+))?" example = "https://lensdump.com/a/ID" def __init__(self, match): self.gallery_id, query = match.groups() if query: url = f"{self.root}/a/{self.gallery_id}/?{query}" else: url = f"{self.root}/a/{self.gallery_id}" GalleryExtractor.__init__(self, match, url) def metadata(self, page): return { "gallery_id": self.gallery_id, "title": text.unescape(text.extr( page, 'property="og:title" content="', '"').strip()) } def images(self, page): for image in self._pagination(page, ' class="list-item ', '>'): data = util.json_loads(text.unquote( text.extr(image, "data-object='", "'") or text.extr(image, 'data-object="', '"'))) image_id = data.get("name") image_url = data.get("url") image_title = data.get("title") if image_title is not None: image_title = text.unescape(image_title) yield (image_url, { "id" : image_id, "url" : image_url, "title" : image_title, "name" : data.get("filename"), "filename" : image_id, "extension": data.get("extension"), "width" : text.parse_int(data.get("width")), "height" : text.parse_int(data.get("height")), }) class LensdumpAlbumsExtractor(LensdumpBase, Extractor): """Extractor for album list from lensdump.com""" subcategory = "albums" pattern = BASE_PATTERN + r"/(?![ai]/)([^/?#]+)(?:/?\?([^#]+))?" example = "https://lensdump.com/USER" def items(self): user, query = self.groups url = f"{self.root}/{user}/" if query: params = text.parse_query(query) else: params = {"sort": "date_asc", "page": "1"} page = self.request(url, params=params).text data = {"_extractor": LensdumpAlbumExtractor} for album_path in self._pagination(page, 'data-url-short="', '"'): album_url = text.urljoin(self.root, album_path) yield Message.Queue, album_url, data class LensdumpImageExtractor(LensdumpBase, Extractor): """Extractor for individual images on lensdump.com""" subcategory = "image" filename_fmt = "{category}_{id}{title:?_//}.{extension}" directory_fmt = ("{category}",) archive_fmt = "{id}" pattern = r"(?:https?://)?(?:(?:i\d?\.)?lensdump\.com|\w\.l3n\.co)/i/(\w+)" example = "https://lensdump.com/i/ID" def items(self): key = self.groups[0] url = f"{self.root}/i/{key}" extr = text.extract_from(self.request(url).text) data = { "id" : key, "title" : text.unescape(extr( 'property="og:title" content="', '"')), "url" : extr( 'property="og:image" content="', '"'), "width" : text.parse_int(extr( 'property="image:width" content="', '"')), "height": text.parse_int(extr( 'property="image:height" content="', '"')), "date" : text.parse_datetime(extr( '<span title="', '"'), "%Y-%m-%d %H:%M:%S"), } text.nameext_from_url(data["url"], data) yield Message.Directory, data yield Message.Url, data["url"], data ����������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/lexica.py����������������������������������������������������0000644�0001750�0001750�00000004316�15040344700�020360� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2023-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lexica.art/""" from .common import Extractor, Message from .. import text class LexicaSearchExtractor(Extractor): """Extractor for lexica.art search results""" category = "lexica" subcategory = "search" root = "https://lexica.art" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "{id}" pattern = r"(?:https?://)?lexica\.art/?\?q=([^&#]+)" example = "https://lexica.art/?q=QUERY" def __init__(self, match): Extractor.__init__(self, match) self.query = match[1] self.text = text.unquote(self.query).replace("+", " ") def items(self): base = ("https://lexica-serve-encoded-images2.sharif.workers.dev" "/full_jpg/") tags = self.text for image in self.posts(): image["filename"] = image["id"] image["extension"] = "jpg" image["search_tags"] = tags yield Message.Directory, image yield Message.Url, base + image["id"], image def posts(self): url = self.root + "/api/infinite-prompts" headers = { "Accept" : "application/json, text/plain, */*", "Referer": f"{self.root}/?q={self.query}", } json = { "text" : self.text, "searchMode": "images", "source" : "search", "cursor" : 0, "model" : "lexica-aperture-v2", } while True: data = self.request_json( url, method="POST", headers=headers, json=json) prompts = { prompt["id"]: prompt for prompt in data["prompts"] } for image in data["images"]: image["prompt"] = prompts[image["promptid"]] del image["promptid"] yield image cursor = data.get("nextCursor") if not cursor: return json["cursor"] = cursor ������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/lightroom.py�������������������������������������������������0000644�0001750�0001750�00000005564�15040344700�021125� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://lightroom.adobe.com/""" from .common import Extractor, Message from .. import text, util class LightroomGalleryExtractor(Extractor): """Extractor for an image gallery on lightroom.adobe.com""" category = "lightroom" subcategory = "gallery" directory_fmt = ("{category}", "{user}", "{title}") filename_fmt = "{num:>04}_{id}.{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?lightroom\.adobe\.com/shares/([0-9a-f]+)" example = "https://lightroom.adobe.com/shares/0123456789abcdef" def __init__(self, match): Extractor.__init__(self, match) self.href = match[1] def items(self): # Get config url = "https://lightroom.adobe.com/shares/" + self.href response = self.request(url) album = util.json_loads( text.extr(response.text, "albumAttributes: ", "\n") ) images = self.images(album) for img in images: url = img["url"] yield Message.Directory, img yield Message.Url, url, text.nameext_from_url(url, img) def metadata(self, album): payload = album["payload"] story = payload.get("story") or {} return { "gallery_id": self.href, "user": story.get("author", ""), "title": story.get("title", payload["name"]), } def images(self, album): album_md = self.metadata(album) base_url = album["base"] next_url = album["links"]["/rels/space_album_images_videos"]["href"] num = 1 while next_url: url = base_url + next_url page = self.request(url).text # skip 1st line as it's a JS loop data = util.json_loads(page[page.index("\n") + 1:]) base_url = data["base"] for res in data["resources"]: img_url, img_size = None, 0 for key, value in res["asset"]["links"].items(): if not key.startswith("/rels/rendition_type/"): continue size = text.parse_int(key.split("/")[-1]) if size > img_size: img_size = size img_url = value["href"] if img_url: img = { "id": res["asset"]["id"], "num": num, "url": base_url + img_url, } img.update(album_md) yield img num += 1 try: next_url = data["links"]["next"]["href"] except KeyError: next_url = None ��������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/livedoor.py��������������������������������������������������0000644�0001750�0001750�00000007766�15040344700�020752� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for http://blog.livedoor.jp/""" from .common import Extractor, Message from .. import text class LivedoorExtractor(Extractor): """Base class for livedoor extractors""" category = "livedoor" root = "http://blog.livedoor.jp" filename_fmt = "{post[id]}_{post[title]}_{num:>02}.{extension}" directory_fmt = ("{category}", "{post[user]}") archive_fmt = "{post[id]}_{hash}" def __init__(self, match): Extractor.__init__(self, match) self.user = match[1] def items(self): for post in self.posts(): if images := self._images(post): yield Message.Directory, {"post": post} for image in images: yield Message.Url, image["url"], image def posts(self): """Return an iterable with post objects""" def _load(self, data, body): extr = text.extract_from(data) tags = text.extr(body, 'class="article-tags">', '</dl>') about = extr('rdf:about="', '"') return { "id" : text.parse_int( about.rpartition("/")[2].partition(".")[0]), "title" : text.unescape(extr('dc:title="', '"')), "categories" : extr('dc:subject="', '"').partition(",")[::2], "description": extr('dc:description="', '"'), "date" : text.parse_datetime(extr('dc:date="', '"')), "tags" : text.split_html(tags)[1:] if tags else [], "user" : self.user, "body" : body, } def _images(self, post): imgs = [] body = post.pop("body") for num, img in enumerate(text.extract_iter(body, "<img ", ">"), 1): src = text.extr(img, 'src="', '"') alt = text.extr(img, 'alt="', '"') if not src: continue if "://livedoor.blogimg.jp/" in src: url = src.replace("http:", "https:", 1).replace("-s.", ".") else: url = text.urljoin(self.root, src) name, _, ext = url.rpartition("/")[2].rpartition(".") imgs.append({ "url" : url, "num" : num, "hash" : name, "filename" : alt or name, "extension": ext, "post" : post, }) return imgs class LivedoorBlogExtractor(LivedoorExtractor): """Extractor for a user's blog on blog.livedoor.jp""" subcategory = "blog" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/?(?:$|[?#])" example = "http://blog.livedoor.jp/USER/" def posts(self): url = f"{self.root}/{self.user}" while url: extr = text.extract_from(self.request(url).text) while True: data = extr('<rdf:RDF', '</rdf:RDF>') if not data: break body = extr('class="article-body-inner">', 'class="article-footer">') yield self._load(data, body) url = extr('<a rel="next" href="', '"') class LivedoorPostExtractor(LivedoorExtractor): """Extractor for images from a blog post on blog.livedoor.jp""" subcategory = "post" pattern = r"(?:https?://)?blog\.livedoor\.jp/(\w+)/archives/(\d+)" example = "http://blog.livedoor.jp/USER/archives/12345.html" def __init__(self, match): LivedoorExtractor.__init__(self, match) self.post_id = match[2] def posts(self): url = f"{self.root}/{self.user}/archives/{self.post_id}.html" extr = text.extract_from(self.request(url).text) data = extr('<rdf:RDF', '</rdf:RDF>') body = extr('class="article-body-inner">', 'class="article-footer">') return (self._load(data, body),) ����������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/lofter.py����������������������������������������������������0000644�0001750�0001750�00000011611�15040344700�020402� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.lofter.com/""" from .common import Extractor, Message from .. import text, util, exception class LofterExtractor(Extractor): """Base class for lofter extractors""" category = "lofter" root = "https://www.lofter.com" directory_fmt = ("{category}", "{blog_name}") filename_fmt = "{id}_{num}.{extension}" archive_fmt = "{id}_{num}" def _init(self): self.api = LofterAPI(self) def items(self): for post in self.posts(): if post is None: continue if "post" in post: post = post["post"] post["blog_name"] = post["blogInfo"]["blogName"] post["date"] = text.parse_timestamp(post["publishTime"] // 1000) post_type = post["type"] # Article if post_type == 1: content = post["content"] image_urls = text.extract_iter(content, '<img src="', '"') image_urls = [text.unescape(x) for x in image_urls] image_urls = [x.partition("?")[0] for x in image_urls] # Photo elif post_type == 2: photo_links = util.json_loads(post["photoLinks"]) image_urls = [x["orign"] for x in photo_links] image_urls = [x.partition("?")[0] for x in image_urls] # Video elif post_type == 4: embed = util.json_loads(post["embed"]) image_urls = [embed["originUrl"]] # Answer elif post_type == 5: images = util.json_loads(post["images"]) image_urls = [x["orign"] for x in images] image_urls = [x.partition("?")[0] for x in image_urls] else: image_urls = () self.log.warning( "%s: Unsupported post type '%s'.", post["id"], post_type) post["count"] = len(image_urls) yield Message.Directory, post for post["num"], url in enumerate(image_urls, 1): yield Message.Url, url, text.nameext_from_url(url, post) def posts(self): return () class LofterPostExtractor(LofterExtractor): """Extractor for a lofter post""" subcategory = "post" pattern = r"(?:https?://)?[\w-]+\.lofter\.com/post/([0-9a-f]+)_([0-9a-f]+)" example = "https://BLOG.lofter.com/post/12345678_90abcdef" def posts(self): blog_id, post_id = self.groups post = self.api.post(int(blog_id, 16), int(post_id, 16)) return (post,) class LofterBlogPostsExtractor(LofterExtractor): """Extractor for a lofter blog's posts""" subcategory = "blog-posts" pattern = (r"(?:https?://)?(?:" # https://www.lofter.com/front/blog/home-page/<blog_name> r"www\.lofter\.com/front/blog/home-page/([\w-]+)|" # https://<blog_name>.lofter.com/ r"([\w-]+)\.lofter\.com" r")/?(?:$|\?|#)") example = "https://BLOG.lofter.com/" def posts(self): blog_name = self.groups[0] or self.groups[1] return self.api.blog_posts(blog_name) class LofterAPI(): def __init__(self, extractor): self.extractor = extractor def blog_posts(self, blog_name): endpoint = "/v2.0/blogHomePage.api" params = { "method": "getPostLists", "offset": 0, "limit": 200, "blogdomain": blog_name + ".lofter.com", } return self._pagination(endpoint, params) def post(self, blog_id, post_id): endpoint = "/oldapi/post/detail.api" params = { "targetblogid": blog_id, "postid": post_id, } return self._call(endpoint, params)["posts"][0] def _call(self, endpoint, data): url = "https://api.lofter.com" + endpoint params = { 'product': 'lofter-android-7.9.10' } response = self.extractor.request( url, method="POST", params=params, data=data) info = response.json() if info["meta"]["status"] == 4200: raise exception.NotFoundError("blog") if info["meta"]["status"] != 200: self.extractor.log.debug("Server response: %s", info) raise exception.AbortExtraction("API request failed") return info["response"] def _pagination(self, endpoint, params): while True: data = self._call(endpoint, params) posts = data["posts"] yield from posts if data["offset"] < 0: break if params["offset"] + len(posts) < data["offset"]: break params["offset"] = data["offset"] �����������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/lolisafe.py��������������������������������������������������0000644�0001750�0001750�00000005003�15040344700�020703� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for lolisafe/chibisafe instances""" from .common import BaseExtractor, Message from .. import text class LolisafeExtractor(BaseExtractor): """Base class for lolisafe extractors""" basecategory = "lolisafe" directory_fmt = ("{category}", "{album_name} ({album_id})") archive_fmt = "{album_id}_{id}" BASE_PATTERN = LolisafeExtractor.update({ }) class LolisafeAlbumExtractor(LolisafeExtractor): subcategory = "album" pattern = BASE_PATTERN + "/a/([^/?#]+)" example = "https://xbunkr.com/a/ID" def __init__(self, match): LolisafeExtractor.__init__(self, match) self.album_id = self.groups[-1] def _init(self): domain = self.config("domain") if domain == "auto": self.root = text.root_from_url(self.url) elif domain: self.root = text.ensure_http_scheme(domain) def items(self): files, data = self.fetch_album(self.album_id) yield Message.Directory, data for data["num"], file in enumerate(files, 1): url = file["file"] file.update(data) if "extension" not in file: text.nameext_from_url(url, file) if "name" in file: name = file["name"] file["name"] = name.rpartition(".")[0] or name _, sep, fid = file["filename"].rpartition("-") if not sep or len(fid) == 12: if "id" not in file: file["id"] = "" file["filename"] = file["name"] else: file["id"] = fid file["filename"] = file["name"] + "-" + fid elif "id" in file: file["name"] = file["filename"] file["filename"] = f"{file['name']}-{file['id']}" else: file["name"], sep, file["id"] = \ file["filename"].rpartition("-") yield Message.Url, url, file def fetch_album(self, album_id): url = f"{self.root}/api/album/get/{album_id}" data = self.request_json(url) return data["files"], { "album_id" : self.album_id, "album_name": text.unescape(data["title"]), "count" : data["count"], } �����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/luscious.py��������������������������������������������������0000644�0001750�0001750�00000021335�15040344700�020761� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://members.luscious.net/""" from .common import Extractor, Message from .. import text, exception class LusciousExtractor(Extractor): """Base class for luscious extractors""" category = "luscious" cookies_domain = ".luscious.net" root = "https://members.luscious.net" def _graphql(self, op, variables, query): data = { "id" : 1, "operationName": op, "query" : query, "variables" : variables, } response = self.request( f"{self.root}/graphql/nobatch/?operationName={op}", method="POST", json=data, fatal=False, ) if response.status_code >= 400: self.log.debug("Server response: %s", response.text) raise exception.AbortExtraction( f"GraphQL query failed " f"('{response.status_code} {response.reason}')") return response.json()["data"] class LusciousAlbumExtractor(LusciousExtractor): """Extractor for image albums from luscious.net""" subcategory = "album" filename_fmt = "{category}_{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{album[id]} {album[title]}") archive_fmt = "{album[id]}_{id}" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/(?:albums|pictures/c/[^/?#]+/album)/[^/?#]+_(\d+)") example = "https://luscious.net/albums/TITLE_12345/" def __init__(self, match): LusciousExtractor.__init__(self, match) self.album_id = match[1] def _init(self): self.gif = self.config("gif", False) def items(self): album = self.metadata() yield Message.Directory, {"album": album} for num, image in enumerate(self.images(), 1): image["num"] = num image["album"] = album try: image["thumbnail"] = image.pop("thumbnails")[0]["url"] except LookupError: image["thumbnail"] = "" image["tags"] = [item["text"] for item in image["tags"]] image["date"] = text.parse_timestamp(image["created"]) image["id"] = text.parse_int(image["id"]) url = (image["url_to_original"] or image["url_to_video"] if self.gif else image["url_to_video"] or image["url_to_original"]) yield Message.Url, url, text.nameext_from_url(url, image) def metadata(self): variables = { "id": self.album_id, } query = """ query AlbumGet($id: ID!) { album { get(id: $id) { ... on Album { ...AlbumStandard } ... on MutationError { errors { code message } } } } } fragment AlbumStandard on Album { __typename id title labels description created modified like_status number_of_favorites rating status marked_for_deletion marked_for_processing number_of_pictures number_of_animated_pictures slug is_manga url download_url permissions cover { width height size url } created_by { id name display_name user_title avatar { url size } url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url url } last_viewed_picture { id position url } } """ album = self._graphql("AlbumGet", variables, query)["album"]["get"] if "errors" in album: raise exception.NotFoundError("album") album["audiences"] = [item["title"] for item in album["audiences"]] album["genres"] = [item["title"] for item in album["genres"]] album["tags"] = [item["text"] for item in album["tags"]] album["cover"] = album["cover"]["url"] album["content"] = album["content"]["title"] album["language"] = album["language"]["title"].partition(" ")[0] album["created_by"] = album["created_by"]["display_name"] album["id"] = text.parse_int(album["id"]) album["date"] = text.parse_timestamp(album["created"]) return album def images(self): variables = { "input": { "filters": [{ "name" : "album_id", "value": self.album_id, }], "display": "position", "page" : 1, }, } query = """ query AlbumListOwnPictures($input: PictureListInput!) { picture { list(input: $input) { info { ...FacetCollectionInfo } items { ...PictureStandardWithoutAlbum } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment PictureStandardWithoutAlbum on Picture { __typename id title created like_status number_of_comments number_of_favorites status width height resolution aspect_ratio url_to_original url_to_video is_animated position tags { id category text url } permissions url thumbnails { width height size url } } """ while True: data = self._graphql("AlbumListOwnPictures", variables, query) yield from data["picture"]["list"]["items"] if not data["picture"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 class LusciousSearchExtractor(LusciousExtractor): """Extractor for album searches on luscious.net""" subcategory = "search" pattern = (r"(?:https?://)?(?:www\.|members\.)?luscious\.net" r"/albums/list/?(?:\?([^#]+))?") example = "https://luscious.net/albums/list/?tagged=TAG" def __init__(self, match): LusciousExtractor.__init__(self, match) self.query = match[1] def items(self): query = text.parse_query(self.query) display = query.pop("display", "date_newest") page = query.pop("page", None) variables = { "input": { "display": display, "filters": [{"name": n, "value": v} for n, v in query.items()], "page": text.parse_int(page, 1), }, } query = """ query AlbumListWithPeek($input: AlbumListInput!) { album { list(input: $input) { info { ...FacetCollectionInfo } items { ...AlbumMinimal peek_thumbnails { width height size url } } } } } fragment FacetCollectionInfo on FacetCollectionInfo { page has_next_page has_previous_page total_items total_pages items_per_page url_complete url_filters_only } fragment AlbumMinimal on Album { __typename id title labels description created modified number_of_favorites number_of_pictures slug is_manga url download_url cover { width height size url } content { id title url } language { id title url } tags { id category text url count } genres { id title slug url } audiences { id title url } } """ while True: data = self._graphql("AlbumListWithPeek", variables, query) for album in data["album"]["list"]["items"]: album["url"] = self.root + album["url"] album["_extractor"] = LusciousAlbumExtractor yield Message.Queue, album["url"], album if not data["album"]["list"]["info"]["has_next_page"]: return variables["input"]["page"] += 1 ���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/lynxchan.py��������������������������������������������������0000644�0001750�0001750�00000004701�15040344700�020735� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for LynxChan Imageboards""" from .common import BaseExtractor, Message from .. import text import itertools class LynxchanExtractor(BaseExtractor): """Base class for LynxChan extractors""" basecategory = "lynxchan" BASE_PATTERN = LynxchanExtractor.update({ "bbw-chan": { "root": "https://bbw-chan.link", "pattern": r"bbw-chan\.(?:link|nl)", }, "kohlchan": { "root": "https://kohlchan.net", "pattern": r"kohlchan\.net", }, "endchan": { "root": None, "pattern": r"endchan\.(?:org|net|gg)", }, }) class LynxchanThreadExtractor(LynxchanExtractor): """Extractor for LynxChan threads""" subcategory = "thread" directory_fmt = ("{category}", "{boardUri}", "{threadId} {subject|message[:50]}") filename_fmt = "{postId}{num:?-//} {filename}.{extension}" archive_fmt = "{boardUri}_{postId}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/res/(\d+)" example = "https://endchan.org/a/res/12345.html" def items(self): url = f"{self.root}/{self.groups[-2]}/res/{self.groups[-1]}.json" thread = self.request_json(url) thread["postId"] = thread["threadId"] posts = thread.pop("posts", ()) yield Message.Directory, thread for post in itertools.chain((thread,), posts): if files := post.pop("files", ()): thread.update(post) for num, file in enumerate(files): file.update(thread) file["num"] = num url = self.root + file["path"] text.nameext_from_url(file["originalName"], file) yield Message.Url, url, file class LynxchanBoardExtractor(LynxchanExtractor): """Extractor for LynxChan boards""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/index|/catalog|/\d+|/?$)" example = "https://endchan.org/a/" def items(self): board = self.groups[-1] url = f"{self.root}/{board}/catalog.json" for thread in self.request_json(url): url = f"{self.root}/{board}/res/{thread['threadId']}.html" thread["_extractor"] = LynxchanThreadExtractor yield Message.Queue, url, thread ���������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/madokami.py��������������������������������������������������0000644�0001750�0001750�00000006633�15040344700�020701� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://manga.madokami.al/""" from .common import Extractor, Message from .. import text, util, exception BASE_PATTERN = r"(?:https?://)?manga\.madokami\.al" class MadokamiExtractor(Extractor): """Base class for madokami extractors""" category = "madokami" root = "https://manga.madokami.al" class MadokamiMangaExtractor(MadokamiExtractor): """Extractor for madokami manga""" subcategory = "manga" directory_fmt = ("{category}", "{manga}") archive_fmt = "{chapter_id}" pattern = rf"{BASE_PATTERN}/Manga/(\w/\w{{2}}/\w{{4}}/.+)" example = "https://manga.madokami.al/Manga/A/AB/ABCD/ABCDE_TITLE" def items(self): username, password = self._get_auth_info() if not username: raise exception.AuthRequired("'username' & 'password'") self.session.auth = util.HTTPBasicAuth(username, password) url = f"{self.root}/Manga/{self.groups[0]}" page = self.request(url).text extr = text.extract_from(page) chapters = [] while True: if not (cid := extr('<tr data-record="', '"')): break chapters.append({ "chapter_id": text.parse_int(cid), "path": text.unescape(extr('href="', '"')), "chapter_string": text.unescape(extr(">", "<")), "size": text.parse_bytes(extr("<td>", "</td>")), "date": text.parse_datetime( extr("<td>", "</td>").strip(), "%Y-%m-%d %H:%M"), }) if self.config("chapter-reverse"): chapters.reverse() self.kwdict.update({ "manga" : text.unescape(extr('itemprop="name">', "<")), "year" : text.parse_int(extr( 'itemprop="datePublished" content="', "-")), "author": text.split_html(extr('<p class="staff', "</p>"))[1::2], "genre" : text.split_html(extr("<h3>Genres</h3>", "</div>")), "tags" : text.split_html(extr("<h3>Tags</h3>", "</div>")), "complete": extr('span class="scanstatus">', "<").lower() == "yes", }) search_chstr = text.re( r"(?i)((?:v(?:ol)?\.?\s*(\d+))" r"(?:\s+ch?\.?\s*(\d+)(?:-(\d+))?)?)").search search_chstr_min = text.re( r"(?i)(ch?\.?\s*(\d+)(?:-(\d+))?)").search for ch in chapters: chstr = ch["chapter_string"] if match := search_chstr(chstr): ch["chapter_string"], volume, chapter, end = match.groups() ch["volume"] = text.parse_int(volume) ch["chapter"] = text.parse_int(chapter) ch["chapter_end"] = text.parse_int(end) elif match := search_chstr_min(chstr): ch["chapter_string"], chapter, end = match.groups() ch["volume"] = 0 ch["chapter"] = text.parse_int(chapter) ch["chapter_end"] = text.parse_int(end) else: ch["volume"] = ch["chapter"] = ch["chapter_end"] = 0 url = f"{self.root}{ch['path']}" text.nameext_from_url(url, ch) yield Message.Directory, ch yield Message.Url, url, ch �����������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/mangadex.py��������������������������������������������������0000644�0001750�0001750�00000034066�15040344700�020704� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2018-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://mangadex.org/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache, memcache from collections import defaultdict BASE_PATTERN = r"(?:https?://)?(?:www\.)?mangadex\.(?:org|cc)" class MangadexExtractor(Extractor): """Base class for mangadex extractors""" category = "mangadex" directory_fmt = ( "{category}", "{manga}", "{volume:?v/ />02}c{chapter:>03}{chapter_minor}{title:?: //}") filename_fmt = ( "{manga}_c{chapter:>03}{chapter_minor}_{page:>03}.{extension}") archive_fmt = "{chapter_id}_{page}" root = "https://mangadex.org" useragent = util.USERAGENT _cache = {} def _init(self): self.uuid = self.groups[0] self.api = MangadexAPI(self) def items(self): for chapter in self.chapters(): uuid = chapter["id"] data = self._transform(chapter) data["_extractor"] = MangadexChapterExtractor self._cache[uuid] = data yield Message.Queue, self.root + "/chapter/" + uuid, data def _items_manga(self): data = {"_extractor": MangadexMangaExtractor} for manga in self.manga(): url = f"{self.root}/title/{manga['id']}" yield Message.Queue, url, data def _transform(self, chapter): relationships = defaultdict(list) for item in chapter["relationships"]: relationships[item["type"]].append(item) manga = self.api.manga(relationships["manga"][0]["id"]) for item in manga["relationships"]: relationships[item["type"]].append(item) cattributes = chapter["attributes"] mattributes = manga["attributes"] if lang := cattributes.get("translatedLanguage"): lang = lang.partition("-")[0] if cattributes["chapter"]: chnum, sep, minor = cattributes["chapter"].partition(".") else: chnum, sep, minor = 0, "", "" data = { "manga" : (mattributes["title"].get("en") or next(iter(mattributes["title"].values()))), "manga_id": manga["id"], "title" : cattributes["title"], "volume" : text.parse_int(cattributes["volume"]), "chapter" : text.parse_int(chnum), "chapter_minor": sep + minor, "chapter_id": chapter["id"], "date" : text.parse_datetime(cattributes["publishAt"]), "lang" : lang, "language": util.code_to_language(lang), "count" : cattributes["pages"], "_external_url": cattributes.get("externalUrl"), } data["artist"] = [artist["attributes"]["name"] for artist in relationships["artist"]] data["author"] = [author["attributes"]["name"] for author in relationships["author"]] data["group"] = [group["attributes"]["name"] for group in relationships["scanlation_group"]] data["status"] = mattributes["status"] data["tags"] = [tag["attributes"]["name"]["en"] for tag in mattributes["tags"]] return data class MangadexChapterExtractor(MangadexExtractor): """Extractor for manga-chapters from mangadex.org""" subcategory = "chapter" pattern = BASE_PATTERN + r"/chapter/([0-9a-f-]+)" example = ("https://mangadex.org/chapter" "/01234567-89ab-cdef-0123-456789abcdef") def items(self): try: data = self._cache.pop(self.uuid) except KeyError: chapter = self.api.chapter(self.uuid) data = self._transform(chapter) if data.get("_external_url") and not data["count"]: raise exception.AbortExtraction( f"Chapter {data['chapter']}{data['chapter_minor']} is not " f"available on MangaDex and can instead be read on the " f"official publisher's website at {data['_external_url']}.") yield Message.Directory, data server = self.api.athome_server(self.uuid) chapter = server["chapter"] base = f"{server['baseUrl']}/data/{chapter['hash']}/" enum = util.enumerate_reversed if self.config( "page-reverse") else enumerate for data["page"], page in enum(chapter["data"], 1): text.nameext_from_url(page, data) yield Message.Url, base + page, data class MangadexMangaExtractor(MangadexExtractor): """Extractor for manga from mangadex.org""" subcategory = "manga" pattern = BASE_PATTERN + r"/(?:title|manga)/(?!follows|feed$)([0-9a-f-]+)" example = ("https://mangadex.org/title" "/01234567-89ab-cdef-0123-456789abcdef") def chapters(self): return self.api.manga_feed(self.uuid) class MangadexFeedExtractor(MangadexExtractor): """Extractor for chapters from your Updates Feed""" subcategory = "feed" pattern = BASE_PATTERN + r"/titles?/feed$()" example = "https://mangadex.org/title/feed" def chapters(self): return self.api.user_follows_manga_feed() class MangadexFollowingExtractor(MangadexExtractor): """Extractor for followed manga from your Library""" subcategory = "following" pattern = BASE_PATTERN + r"/titles?/follows(?:\?([^#]+))?$" example = "https://mangadex.org/title/follows" items = MangadexExtractor._items_manga def manga(self): return self.api.user_follows_manga() class MangadexListExtractor(MangadexExtractor): """Extractor for mangadex MDLists""" subcategory = "list" pattern = (BASE_PATTERN + r"/list/([0-9a-f-]+)(?:/[^/?#]*)?(?:\?tab=(\w+))?") example = ("https://mangadex.org/list" "/01234567-89ab-cdef-0123-456789abcdef/NAME") def __init__(self, match): if match[2] == "feed": self.subcategory = "list-feed" else: self.items = self._items_manga MangadexExtractor.__init__(self, match) def chapters(self): return self.api.list_feed(self.uuid) def manga(self): return [ item for item in self.api.list(self.uuid)["relationships"] if item["type"] == "manga" ] class MangadexAuthorExtractor(MangadexExtractor): """Extractor for mangadex authors""" subcategory = "author" pattern = BASE_PATTERN + r"/author/([0-9a-f-]+)" example = ("https://mangadex.org/author" "/01234567-89ab-cdef-0123-456789abcdef/NAME") def items(self): for manga in self.api.manga_author(self.uuid): manga["_extractor"] = MangadexMangaExtractor url = f"{self.root}/title/{manga['id']}" yield Message.Queue, url, manga class MangadexAPI(): """Interface for the MangaDex API v5 https://api.mangadex.org/docs/ """ def __init__(self, extr): self.extractor = extr self.headers = None self.headers_auth = {} self.username, self.password = extr._get_auth_info() if self.username: self.client_id = cid = extr.config("client-id") self.client_secret = extr.config("client-secret") if cid: self._authenticate_impl = self._authenticate_impl_client else: self._authenticate_impl = self._authenticate_impl_legacy else: self.authenticate = util.noop server = extr.config("api-server") self.root = ("https://api.mangadex.org" if server is None else text.ensure_http_scheme(server).rstrip("/")) def athome_server(self, uuid): return self._call("/at-home/server/" + uuid) def author(self, uuid, manga=False): params = {"includes[]": ("manga",)} if manga else None return self._call("/author/" + uuid, params)["data"] def chapter(self, uuid): params = {"includes[]": ("scanlation_group",)} return self._call("/chapter/" + uuid, params)["data"] def list(self, uuid): return self._call("/list/" + uuid, None, True)["data"] def list_feed(self, uuid): return self._pagination_chapters("/list/" + uuid + "/feed", None, True) @memcache(keyarg=1) def manga(self, uuid): params = {"includes[]": ("artist", "author")} return self._call("/manga/" + uuid, params)["data"] def manga_author(self, uuid_author): params = {"authorOrArtist": uuid_author} return self._pagination_manga("/manga", params) def manga_feed(self, uuid): order = "desc" if self.extractor.config("chapter-reverse") else "asc" params = { "order[volume]" : order, "order[chapter]": order, } return self._pagination_chapters("/manga/" + uuid + "/feed", params) def user_follows_manga(self): params = {"contentRating": None} return self._pagination_manga( "/user/follows/manga", params, True) def user_follows_manga_feed(self): params = {"order[publishAt]": "desc"} return self._pagination_chapters( "/user/follows/manga/feed", params, True) def authenticate(self): self.headers_auth["Authorization"] = \ self._authenticate_impl(self.username, self.password) @cache(maxage=900, keyarg=1) def _authenticate_impl_client(self, username, password): if refresh_token := _refresh_token_cache((username, "personal")): self.extractor.log.info("Refreshing access token") data = { "grant_type" : "refresh_token", "refresh_token": refresh_token, "client_id" : self.client_id, "client_secret": self.client_secret, } else: self.extractor.log.info("Logging in as %s", username) data = { "grant_type" : "password", "username" : self.username, "password" : self.password, "client_id" : self.client_id, "client_secret": self.client_secret, } self.extractor.log.debug("Using client-id '%s…'", self.client_id[:24]) url = ("https://auth.mangadex.org/realms/mangadex" "/protocol/openid-connect/token") data = self.extractor.request_json( url, method="POST", data=data, fatal=None) try: access_token = data["access_token"] except Exception: raise exception.AuthenticationError(data.get("error_description")) if refresh_token != data.get("refresh_token"): _refresh_token_cache.update( (username, "personal"), data["refresh_token"]) return "Bearer " + access_token @cache(maxage=900, keyarg=1) def _authenticate_impl_legacy(self, username, password): if refresh_token := _refresh_token_cache(username): self.extractor.log.info("Refreshing access token") url = self.root + "/auth/refresh" json = {"token": refresh_token} else: self.extractor.log.info("Logging in as %s", username) url = self.root + "/auth/login" json = {"username": username, "password": password} self.extractor.log.debug("Using legacy login method") data = self.extractor.request_json( url, method="POST", json=json, fatal=None) if data.get("result") != "ok": raise exception.AuthenticationError() if refresh_token != data["token"]["refresh"]: _refresh_token_cache.update(username, data["token"]["refresh"]) return "Bearer " + data["token"]["session"] def _call(self, endpoint, params=None, auth=False): url = self.root + endpoint headers = self.headers_auth if auth else self.headers while True: if auth: self.authenticate() response = self.extractor.request( url, params=params, headers=headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: until = response.headers.get("X-RateLimit-Retry-After") self.extractor.wait(until=until) continue msg = ", ".join(f'{error["title"]}: "{error["detail"]}"' for error in response.json()["errors"]) raise exception.AbortExtraction( f"{response.status_code} {response.reason} ({msg})") def _pagination_chapters(self, endpoint, params=None, auth=False): if params is None: params = {} lang = self.extractor.config("lang") if isinstance(lang, str) and "," in lang: lang = lang.split(",") params["translatedLanguage[]"] = lang params["includes[]"] = ("scanlation_group",) return self._pagination(endpoint, params, auth) def _pagination_manga(self, endpoint, params=None, auth=False): if params is None: params = {} return self._pagination(endpoint, params, auth) def _pagination(self, endpoint, params, auth=False): config = self.extractor.config if "contentRating" not in params: ratings = config("ratings") if ratings is None: ratings = ("safe", "suggestive", "erotica", "pornographic") elif isinstance(ratings, str): ratings = ratings.split(",") params["contentRating[]"] = ratings params["offset"] = 0 if api_params := config("api-parameters"): params.update(api_params) while True: data = self._call(endpoint, params, auth) yield from data["data"] params["offset"] = data["offset"] + data["limit"] if params["offset"] >= data["total"]: return @cache(maxage=90*86400, keyarg=0) def _refresh_token_cache(username): return None ��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������././@PaxHeader��������������������������������������������������������������������������������������0000000�0000000�0000000�00000000026�00000000000�010213� x����������������������������������������������������������������������������������������������������ustar�00�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������22 mtime=1753336256.0 ����������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������gallery_dl-1.30.2/gallery_dl/extractor/mangafox.py��������������������������������������������������0000644�0001750�0001750�00000007507�15040344700�020720� 0����������������������������������������������������������������������������������������������������ustar�00mike����������������������������mike�������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������# -*- coding: utf-8 -*- # Copyright 2017-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://fanfox.net/""" from .common import ChapterExtractor, MangaExtractor from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.|m\.)?(?:fanfox\.net|mangafox\.me)" class MangafoxChapterExtractor(ChapterExtractor): """Extractor for manga chapters from fanfox.net""" category = "mangafox" root = "https://m.fanfox.net" pattern = BASE_PATTERN + \ r"(/manga/[^/?#]+/((?:v([^/?#]+)/)?c(\d+)([^/?#]*)))" example = "https://fanfox.net/manga/TITLE/v01/c001/1.html" def __init__(self, match): base, self.cstr, self.volume, self.chapter, self.minor = match.groups() self.urlbase = self.root + base ChapterExtractor.__init__(self, match, self.urlbase + "/1.html") def metadata(self, page): manga, pos = text.extract(page, "<title>", "") count, pos = text.extract( page, ">", "<", page.find("", pos) - 40) sid , pos = text.extract(page, "var series_id =", ";", pos) cid , pos = text.extract(page, "var chapter_id =", ";", pos) return { "manga" : text.unescape(manga), "volume" : text.parse_int(self.volume), "chapter" : text.parse_int(self.chapter), "chapter_minor" : self.minor or "", "chapter_string": self.cstr, "count" : text.parse_int(count), "sid" : text.parse_int(sid), "cid" : text.parse_int(cid), } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '', '

    ') author = extr('

    Author(s):', '

    ') extr('
    ', '') genres, _, summary = text.extr( page, '
    ', '' ).partition('
    ') data = { "manga" : text.unescape(manga), "author" : text.remove_html(author), "description": text.unescape(text.remove_html(summary)), "tags" : text.split_html(genres), "lang" : "en", "language" : "English", } while True: url = "https://" + extr('', ''), "%b %d, %Y"), } chapter.update(data) results.append((url, chapter)) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/mangahere.py0000644000175000017500000001076115040344700021043 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangahere.cc/""" from .common import ChapterExtractor, MangaExtractor from .. import text, util class MangahereBase(): """Base class for mangahere extractors""" category = "mangahere" root = "https://www.mangahere.cc" root_mobile = "https://m.mangahere.cc" class MangahereChapterExtractor(MangahereBase, ChapterExtractor): """Extractor for manga-chapters from mangahere.cc""" pattern = (r"(?:https?://)?(?:www\.|m\.)?mangahere\.c[co]/manga/" r"([^/]+(?:/v0*(\d+))?/c([^/?#]+))") example = "https://www.mangahere.cc/manga/TITLE/c001/1.html" def __init__(self, match): self.part, self.volume, self.chapter = match.groups() self.base = f"{self.root_mobile}/manga/{self.part}/" ChapterExtractor.__init__(self, match, f"{self.base}1.html") def _init(self): self.session.headers["Referer"] = self.root_mobile + "/" def metadata(self, page): pos = page.index("") count , pos = text.extract(page, ">", "<", pos - 40) manga_id , pos = text.extract(page, "series_id = ", ";", pos) chapter_id, pos = text.extract(page, "chapter_id = ", ";", pos) manga , pos = text.extract(page, '"name":"', '"', pos) chapter, dot, minor = self.chapter.partition(".") return { "manga": text.unescape(manga), "manga_id": text.parse_int(manga_id), "title": self._get_title(), "volume": text.parse_int(self.volume), "chapter": text.parse_int(chapter), "chapter_minor": dot + minor, "chapter_id": text.parse_int(chapter_id), "count": text.parse_int(count), "lang": "en", "language": "English", } def images(self, page): pnum = 1 while True: url, pos = text.extract(page, '', '<', pos) date, pos = text.extract(page, 'class="title2">', '<', pos) match = util.re( r"(?:Vol\.0*(\d+) )?Ch\.0*(\d+)(\S*)(?: - (.*))?").match(info) if match: volume, chapter, minor, title = match.groups() else: chapter, _, minor = url[:-1].rpartition("/c")[2].partition(".") minor = "." + minor volume = 0 title = "" results.append((text.urljoin(self.root, url), { "manga": manga, "title": text.unescape(title) if title else "", "volume": text.parse_int(volume), "chapter": text.parse_int(chapter), "chapter_minor": minor, "date": date, "lang": "en", "language": "English", })) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/manganelo.py0000644000175000017500000001072415040344700021054 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Jake Mannens # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangakakalot.gg/ and mirror sites""" from .common import BaseExtractor, ChapterExtractor, MangaExtractor from .. import text, util class ManganeloExtractor(BaseExtractor): basecategory = "manganelo" BASE_PATTERN = ManganeloExtractor.update({ "nelomanga": { "root" : "https://www.nelomanga.net", "pattern": r"(?:www\.)?nelomanga\.net", }, "natomanga": { "root" : "https://www.natomanga.com", "pattern": r"(?:www\.)?natomanga\.com", }, "manganato": { "root" : "https://www.manganato.gg", "pattern": r"(?:www\.)?manganato\.gg", }, "mangakakalot": { "root" : "https://www.mangakakalot.gg", "pattern": r"(?:www\.)?mangakakalot\.gg", }, }) class ManganeloChapterExtractor(ManganeloExtractor, ChapterExtractor): """Extractor for manganelo manga chapters""" pattern = BASE_PATTERN + r"(/manga/[^/?#]+/chapter-[^/?#]+)" example = "https://www.mangakakalot.gg/manga/MANGA_NAME/chapter-123" def __init__(self, match): ManganeloExtractor.__init__(self, match) self.page_url = self.root + self.groups[-1] def metadata(self, page): extr = text.extract_from(page) data = { "date" : text.parse_datetime(extr( '"datePublished": "', '"')[:19], "%Y-%m-%dT%H:%M:%S"), "date_updated": text.parse_datetime(extr( '"dateModified": "', '"')[:19], "%Y-%m-%dT%H:%M:%S"), "manga_id" : text.parse_int(extr("comic_id =", ";")), "chapter_id" : text.parse_int(extr("chapter_id =", ";")), "manga" : extr("comic_name =", ";").strip('" '), "lang" : "en", "language" : "English", } chapter_name = extr("chapter_name =", ";").strip('" ') chapter, sep, minor = chapter_name.rpartition(" ")[2].partition(".") data["chapter"] = text.parse_int(chapter) data["chapter_minor"] = sep + minor data["author"] = extr(". Author:", " already has ").strip() return data def images(self, page): extr = text.extract_from(page) cdns = util.json_loads(extr("var cdns =", ";"))[0] imgs = util.json_loads(extr("var chapterImages =", ";")) if cdns[-1] != "/": cdns += "/" return [ (cdns + path, None) for path in imgs ] class ManganeloMangaExtractor(ManganeloExtractor, MangaExtractor): """Extractor for manganelo manga""" chapterclass = ManganeloChapterExtractor pattern = BASE_PATTERN + r"(/manga/[^/?#]+)$" example = "https://www.mangakakalot.gg/manga/MANGA_NAME" def __init__(self, match): ManganeloExtractor.__init__(self, match) self.page_url = self.root + self.groups[-1] def chapters(self, page): extr = text.extract_from(page) manga = text.unescape(extr("

    ", "<")) author = text.remove_html(extr("
  • Author(s) :", "")) status = extr("
  • Status :", "<").strip() update = text.parse_datetime(extr( "
  • Last updated :", "<").strip(), "%b-%d-%Y %I:%M:%S %p") tags = text.split_html(extr(">Genres :", "
  • "))[::2] results = [] for chapter in text.extract_iter(page, '
    ', '
    '): url, pos = text.extract(chapter, '', '', pos) date, pos = text.extract(chapter, '', '

    ') data = {"tags": list(text.split_html(tags)[::2])} info = text.extr(page, '

    ', "

    ") if not info: raise exception.NotFoundError("chapter") self.parse_chapter_string(info, data) return data def images(self, page): page = text.extr( page, '
    ', '
    "): url , pos = text.extract(chapter, '", "", pos) self.parse_chapter_string(info, data) results.append((url, data.copy())) return results def metadata(self, page): extr = text.extract_from(text.extr( page, 'class="summary_content">', 'class="manga-action"')) return { "manga" : text.extr(page, "

    ", "

    ").strip(), "description": text.unescape(text.remove_html(text.extract( page, ">", "
    ", page.index("summary__content"))[0])), "rating" : text.parse_float( extr('total_votes">', "").strip()), "manga_alt" : text.remove_html( extr("Alternative\t\t\n\t
    ", "
    ")).split("; "), "author" : list(text.extract_iter( extr('class="author-content">', "
    "), '"tag">', "")), "artist" : list(text.extract_iter( extr('class="artist-content">', ""), '"tag">', "")), "genres" : list(text.extract_iter( extr('class="genres-content">', ""), '"tag">', "")), "type" : text.remove_html( extr(" Type ", "\n")), "release" : text.parse_int(text.remove_html( extr(" Release ", "\n"))), "status" : text.remove_html( extr(" Status ", "\n")), } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/mangoxo.py0000644000175000017500000001312515040344700020561 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.mangoxo.com/""" from .common import Extractor, Message from .. import text, exception from ..cache import cache import hashlib import time class MangoxoExtractor(Extractor): """Base class for mangoxo extractors""" category = "mangoxo" root = "https://www.mangoxo.com" cookies_domain = "www.mangoxo.com" cookies_names = ("SESSION",) _warning = True def login(self): username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) elif MangoxoExtractor._warning: MangoxoExtractor._warning = False self.log.warning("Unauthenticated users cannot see " "more than 5 images per album") @cache(maxage=3*3600, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" page = self.request(url).text token = text.extr(page, 'id="loginToken" value="', '"') url = self.root + "/api/login" headers = { "X-Requested-With": "XMLHttpRequest", "Referer": self.root + "/login", } data = self._sign_by_md5(username, password, token) response = self.request(url, method="POST", headers=headers, data=data) data = response.json() if str(data.get("result")) != "1": raise exception.AuthenticationError(data.get("msg")) return {"SESSION": self.cookies.get("SESSION")} def _sign_by_md5(self, username, password, token): # https://dns.mangoxo.com/libs/plugins/phoenix-ui/js/phoenix-ui.js params = [ ("username" , username), ("password" , password), ("token" , token), ("timestamp", str(int(time.time()))), ] query = "&".join("=".join(item) for item in sorted(params)) query += "&secretKey=340836904" sign = hashlib.md5(query.encode()).hexdigest() params.append(("sign", sign.upper())) return params def _total_pages(self, page): return text.parse_int(text.extract(page, "total :", ",")[0]) class MangoxoAlbumExtractor(MangoxoExtractor): """Extractor for albums on mangoxo.com""" subcategory = "album" filename_fmt = "{album[id]}_{num:>03}.{extension}" directory_fmt = ("{category}", "{channel[name]}", "{album[name]}") archive_fmt = "{album[id]}_{num}" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/album/(\w+)" example = "https://www.mangoxo.com/album/ID" def __init__(self, match): MangoxoExtractor.__init__(self, match) self.album_id = match[1] def items(self): self.login() url = f"{self.root}/album/{self.album_id}/" page = self.request(url).text data = self.metadata(page) imgs = self.images(url, page) yield Message.Directory, data data["extension"] = None for data["num"], path in enumerate(imgs, 1): data["id"] = text.parse_int(text.extr(path, "=", "&")) url = self.root + "/external/" + path.rpartition("url=")[2] yield Message.Url, url, text.nameext_from_url(url, data) def metadata(self, page): """Return general metadata""" extr = text.extract_from(page) title = extr('', '', '<') date = extr('class="fa fa-calendar">', '<') descr = extr('
    ', '
    ') return { "channel": { "id": cid, "name": text.unescape(cname), "cover": cover, }, "album": { "id": self.album_id, "name": text.unescape(title), "date": text.parse_datetime(date.strip(), "%Y.%m.%d %H:%M"), "description": text.unescape(descr), }, "count": text.parse_int(count), } def images(self, url, page): """Generator; Yields all image URLs""" total = self._total_pages(page) num = 1 while True: yield from text.extract_iter( page, 'class="lightgallery-item" href="', '"') if num >= total: return num += 1 page = self.request(url + str(num)).text class MangoxoChannelExtractor(MangoxoExtractor): """Extractor for all albums on a mangoxo channel""" subcategory = "channel" pattern = r"(?:https?://)?(?:www\.)?mangoxo\.com/(\w+)/album" example = "https://www.mangoxo.com/USER/album" def __init__(self, match): MangoxoExtractor.__init__(self, match) self.user = match[1] def items(self): self.login() num = total = 1 url = f"{self.root}/{self.user}/album/" data = {"_extractor": MangoxoAlbumExtractor} while True: page = self.request(url + str(num)).text for album in text.extract_iter( page, '= total: return num += 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/mastodon.py0000644000175000017500000003000115040344700020725 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Mastodon instances""" from .common import BaseExtractor, Message from .. import text, exception from ..cache import cache class MastodonExtractor(BaseExtractor): """Base class for mastodon extractors""" basecategory = "mastodon" directory_fmt = ("mastodon", "{instance}", "{account[username]}") filename_fmt = "{category}_{id}_{media[id]}.{extension}" archive_fmt = "{media[id]}" def __init__(self, match): BaseExtractor.__init__(self, match) self.item = self.groups[-1] def _init(self): self.instance = self.root.partition("://")[2] self.reblogs = self.config("reblogs", False) self.replies = self.config("replies", True) self.cards = self.config("cards", False) def items(self): for status in self.statuses(): if self._check_moved: self._check_moved(status["account"]) if not self.reblogs and status["reblog"]: self.log.debug("Skipping %s (reblog)", status["id"]) continue if not self.replies and status["in_reply_to_id"]: self.log.debug("Skipping %s (reply)", status["id"]) continue attachments = status["media_attachments"] del status["media_attachments"] if status["reblog"]: attachments.extend(status["reblog"]["media_attachments"]) if self.cards: if card := status.get("card"): if url := card.get("image"): card["weburl"] = card.get("url") card["url"] = url card["id"] = "card" + "".join( url.split("/")[6:-2]).lstrip("0") attachments.append(card) status["instance"] = self.instance acct = status["account"]["acct"] status["instance_remote"] = \ acct.rpartition("@")[2] if "@" in acct else None status["count"] = len(attachments) status["tags"] = [tag["name"] for tag in status["tags"]] status["date"] = text.parse_datetime( status["created_at"][:19], "%Y-%m-%dT%H:%M:%S") yield Message.Directory, status for status["num"], media in enumerate(attachments, 1): status["media"] = media url = media["url"] yield Message.Url, url, text.nameext_from_url(url, status) def statuses(self): """Return an iterable containing all relevant Status objects""" return () def _check_moved(self, account): self._check_moved = None # Certain fediverse software (such as Iceshrimp and Sharkey) have a # null account "moved" field instead of not having it outright. # To handle this, check if the "moved" value is truthy instead # if only it exists. if account.get("moved"): self.log.warning("Account '%s' moved to '%s'", account["acct"], account["moved"]["acct"]) BASE_PATTERN = MastodonExtractor.update({ "mastodon.social": { "root" : "https://mastodon.social", "pattern" : r"mastodon\.social", "access-token" : "Y06R36SMvuXXN5_wiPKFAEFiQaMSQg0o_hGgc86Jj48", "client-id" : "dBSHdpsnOUZgxOnjKSQrWEPakO3ctM7HmsyoOd4FcRo", "client-secret": "DdrODTHs_XoeOsNVXnILTMabtdpWrWOAtrmw91wU1zI", }, "pawoo": { "root" : "https://pawoo.net", "pattern" : r"pawoo\.net", "access-token" : "c12c9d275050bce0dc92169a28db09d7" "0d62d0a75a8525953098c167eacd3668", "client-id" : "978a25f843ec01e53d09be2c290cd75c" "782bc3b7fdbd7ea4164b9f3c3780c8ff", "client-secret": "9208e3d4a7997032cf4f1b0e12e5df38" "8428ef1fadb446dcfeb4f5ed6872d97b", }, "baraag": { "root" : "https://baraag.net", "pattern" : r"baraag\.net", "access-token" : "53P1Mdigf4EJMH-RmeFOOSM9gdSDztmrAYFgabOKKE0", "client-id" : "czxx2qilLElYHQ_sm-lO8yXuGwOHxLX9RYYaD0-nq1o", "client-secret": "haMaFdMBgK_-BIxufakmI2gFgkYjqmgXGEO2tB-R2xY", } }) + "(?:/web)?" class MastodonUserExtractor(MastodonExtractor): """Extractor for all images of an account/user""" subcategory = "user" pattern = BASE_PATTERN + r"/(?:@|users/)([^/?#]+)(?:/media)?/?$" example = "https://mastodon.social/@USER" def statuses(self): api = MastodonAPI(self) return api.account_statuses( api.account_id_by_username(self.item), only_media=( not self.reblogs and not self.cards and not self.config("text-posts", False) ), exclude_replies=not self.replies, ) class MastodonBookmarkExtractor(MastodonExtractor): """Extractor for mastodon bookmarks""" subcategory = "bookmark" pattern = BASE_PATTERN + r"/bookmarks" example = "https://mastodon.social/bookmarks" def statuses(self): return MastodonAPI(self).account_bookmarks() class MastodonFavoriteExtractor(MastodonExtractor): """Extractor for mastodon favorites""" subcategory = "favorite" pattern = BASE_PATTERN + r"/favourites" example = "https://mastodon.social/favourites" def statuses(self): return MastodonAPI(self).account_favorites() class MastodonListExtractor(MastodonExtractor): """Extractor for mastodon lists""" subcategory = "list" pattern = BASE_PATTERN + r"/lists/(\w+)" example = "https://mastodon.social/lists/12345" def statuses(self): return MastodonAPI(self).timelines_list(self.item) class MastodonHashtagExtractor(MastodonExtractor): """Extractor for mastodon hashtags""" subcategory = "hashtag" pattern = BASE_PATTERN + r"/tags/(\w+)" example = "https://mastodon.social/tags/NAME" def statuses(self): return MastodonAPI(self).timelines_tag(self.item) class MastodonFollowingExtractor(MastodonExtractor): """Extractor for followed mastodon users""" subcategory = "following" pattern = BASE_PATTERN + r"/(?:@|users/)([^/?#]+)/following" example = "https://mastodon.social/@USER/following" def items(self): api = MastodonAPI(self) account_id = api.account_id_by_username(self.item) for account in api.account_following(account_id): account["_extractor"] = MastodonUserExtractor yield Message.Queue, account["url"], account class MastodonStatusExtractor(MastodonExtractor): """Extractor for images from a status""" subcategory = "status" pattern = (BASE_PATTERN + r"/(?:@[^/?#]+|(?:users/[^/?#]+/)?" r"(?:statuses|notice|objects()))/(?!following)([^/?#]+)") example = "https://mastodon.social/@USER/12345" def statuses(self): if self.groups[-2] is not None: url = f"{self.root}/objects/{self.item}" location = self.request_location(url) self.item = location.rpartition("/")[2] return (MastodonAPI(self).status(self.item),) class MastodonAPI(): """Minimal interface for the Mastodon API https://docs.joinmastodon.org/ https://github.com/tootsuite/mastodon """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor access_token = extractor.config("access-token") if access_token is None or access_token == "cache": access_token = _access_token_cache(extractor.instance) if not access_token: access_token = extractor.config_instance("access-token") if access_token: self.headers = {"Authorization": "Bearer " + access_token} else: self.headers = None def account_id_by_username(self, username): if username.startswith("id:"): return username[3:] try: return self.account_lookup(username)["id"] except Exception: # fall back to account search pass if "@" in username: handle = "@" + username else: handle = f"@{username}@{self.extractor.instance}" for account in self.account_search(handle, 1): if account["acct"] == username: self.extractor._check_moved(account) return account["id"] raise exception.NotFoundError("account") def account_bookmarks(self): """Statuses the user has bookmarked""" endpoint = "/v1/bookmarks" return self._pagination(endpoint, None) def account_favorites(self): """Statuses the user has favourited""" endpoint = "/v1/favourites" return self._pagination(endpoint, None) def account_following(self, account_id): """Accounts which the given account is following""" endpoint = f"/v1/accounts/{account_id}/following" return self._pagination(endpoint, None) def account_lookup(self, username): """Quickly lookup a username to see if it is available""" endpoint = "/v1/accounts/lookup" params = {"acct": username} return self._call(endpoint, params).json() def account_search(self, query, limit=40): """Search for matching accounts by username or display name""" endpoint = "/v1/accounts/search" params = {"q": query, "limit": limit} return self._call(endpoint, params).json() def account_statuses(self, account_id, only_media=True, exclude_replies=False): """Statuses posted to the given account""" endpoint = f"/v1/accounts/{account_id}/statuses" params = {"only_media" : "true" if only_media else "false", "exclude_replies": "true" if exclude_replies else "false"} return self._pagination(endpoint, params) def status(self, status_id): """Obtain information about a status""" endpoint = "/v1/statuses/" + status_id return self._call(endpoint).json() def timelines_list(self, list_id): """View statuses in the given list timeline""" endpoint = "/v1/timelines/list/" + list_id return self._pagination(endpoint, None) def timelines_tag(self, hashtag): """View public statuses containing the given hashtag""" endpoint = "/v1/timelines/tag/" + hashtag return self._pagination(endpoint, None) def _call(self, endpoint, params=None): if endpoint.startswith("http"): url = endpoint else: url = self.root + "/api" + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) code = response.status_code if code < 400: return response if code == 401: raise exception.AbortExtraction( f"Invalid or missing access token.\nRun 'gallery-dl oauth:" f"mastodon:{self.extractor.instance}' to obtain one.") if code == 404: raise exception.NotFoundError() if code == 429: self.extractor.wait(until=text.parse_datetime( response.headers["x-ratelimit-reset"], "%Y-%m-%dT%H:%M:%S.%fZ", )) continue raise exception.AbortExtraction(response.json().get("error")) def _pagination(self, endpoint, params): url = endpoint while url: response = self._call(url, params) yield from response.json() url = response.links.get("next") if not url: return url = url["url"] params = None @cache(maxage=36500*86400, keyarg=0) def _access_token_cache(instance): return None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1743510441.0 gallery_dl-1.30.2/gallery_dl/extractor/message.py0000644000175000017500000000345314772755651020566 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2015-2021 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. class Message(): """Enum for message identifiers Extractors yield their results as message-tuples, where the first element is one of the following identifiers. This message-identifier determines the type and meaning of the other elements in such a tuple. - Message.Version: - Message protocol version (currently always '1') - 2nd element specifies the version of all following messages as integer - Message.Directory: - Sets the target directory for all following images - 2nd element is a dictionary containing general metadata - Message.Url: - Image URL and its metadata - 2nd element is the URL as a string - 3rd element is a dictionary with image-specific metadata - Message.Headers: # obsolete - HTTP headers to use while downloading - 2nd element is a dictionary with header-name and -value pairs - Message.Cookies: # obsolete - Cookies to use while downloading - 2nd element is a dictionary with cookie-name and -value pairs - Message.Queue: - (External) URL that should be handled by another extractor - 2nd element is the (external) URL as a string - 3rd element is a dictionary containing URL-specific metadata - Message.Urllist: # obsolete - Same as Message.Url, but its 2nd element is a list of multiple URLs - The additional URLs serve as a fallback if the primary one fails """ Version = 1 Directory = 2 Url = 3 # Headers = 4 # Cookies = 5 Queue = 6 # Urllist = 7 # Metadata = 8 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/misskey.py0000644000175000017500000002033515040344700020576 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Misskey instances""" from .common import BaseExtractor, Message, Dispatch from .. import text, exception from ..cache import memcache class MisskeyExtractor(BaseExtractor): """Base class for Misskey extractors""" basecategory = "misskey" directory_fmt = ("misskey", "{instance}", "{user[username]}") filename_fmt = "{category}_{id}_{file[id]}.{extension}" archive_fmt = "{id}_{file[id]}" def __init__(self, match): BaseExtractor.__init__(self, match) self.item = self.groups[-1] def _init(self): self.api = MisskeyAPI(self) self.instance = self.root.rpartition("://")[2] self.renotes = self.config("renotes", False) self.replies = self.config("replies", True) def items(self): for note in self.notes(): if "note" in note: note = note["note"] files = note.pop("files") or [] if renote := note.get("renote"): if not self.renotes: self.log.debug("Skipping %s (renote)", note["id"]) continue files.extend(renote.get("files") or ()) if reply := note.get("reply"): if not self.replies: self.log.debug("Skipping %s (reply)", note["id"]) continue files.extend(reply.get("files") or ()) note["instance"] = self.instance note["instance_remote"] = note["user"]["host"] note["count"] = len(files) note["date"] = text.parse_datetime( note["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z") yield Message.Directory, note for note["num"], file in enumerate(files, 1): file["date"] = text.parse_datetime( file["createdAt"], "%Y-%m-%dT%H:%M:%S.%f%z") note["file"] = file url = file["url"] yield Message.Url, url, text.nameext_from_url(url, note) def notes(self): """Return an iterable containing all relevant Note objects""" return () def _make_note(self, type, user, url): # extract real URL from potential proxy path, sep, query = url.partition("?") if sep: url = text.parse_query(query).get("url") or path return { "id" : type, "user" : user, "files": ({ "id" : url.rpartition("/")[2].partition(".")[0], # ID from URL "url": url, "createdAt": "", },), "createdAt": "", } BASE_PATTERN = MisskeyExtractor.update({ "misskey.io": { "root": "https://misskey.io", "pattern": r"misskey\.io", }, "misskey.design": { "root": "https://misskey.design", "pattern": r"misskey\.design", }, "lesbian.energy": { "root": "https://lesbian.energy", "pattern": r"lesbian\.energy", }, "sushi.ski": { "root": "https://sushi.ski", "pattern": r"sushi\.ski", }, }) class MisskeyUserExtractor(Dispatch, MisskeyExtractor): """Extractor for all images of a Misskey user""" subcategory = "user" pattern = BASE_PATTERN + r"/@([^/?#]+)/?$" example = "https://misskey.io/@USER" def items(self): base = f"{self.root}/@{self.item}/" return self._dispatch_extractors(( (MisskeyInfoExtractor , base + "info"), (MisskeyAvatarExtractor , base + "avatar"), (MisskeyBackgroundExtractor, base + "banner"), (MisskeyNotesExtractor , base + "notes"), ), ("notes",)) class MisskeyNotesExtractor(MisskeyExtractor): """Extractor for a Misskey user's notes""" subcategory = "notes" pattern = BASE_PATTERN + r"/@([^/?#]+)/notes" example = "https://misskey.io/@USER/notes" def notes(self): return self.api.users_notes(self.api.user_id_by_username(self.item)) class MisskeyInfoExtractor(MisskeyExtractor): """Extractor for a Misskey user's profile data""" subcategory = "info" pattern = BASE_PATTERN + r"/@([^/?#]+)/info" example = "https://misskey.io/@USER/info" def items(self): user = self.api.users_show(self.item) return iter(((Message.Directory, user),)) class MisskeyAvatarExtractor(MisskeyExtractor): """Extractor for a Misskey user's avatar""" subcategory = "avatar" pattern = BASE_PATTERN + r"/@([^/?#]+)/avatar" example = "https://misskey.io/@USER/avatar" def notes(self): user = self.api.users_show(self.item) url = user.get("avatarUrl") return (self._make_note("avatar", user, url),) if url else () class MisskeyBackgroundExtractor(MisskeyExtractor): """Extractor for a Misskey user's banner image""" subcategory = "background" pattern = BASE_PATTERN + r"/@([^/?#]+)/ba(?:nner|ckground)" example = "https://misskey.io/@USER/banner" def notes(self): user = self.api.users_show(self.item) url = user.get("bannerUrl") return (self._make_note("background", user, url),) if url else () class MisskeyFollowingExtractor(MisskeyExtractor): """Extractor for followed Misskey users""" subcategory = "following" pattern = BASE_PATTERN + r"/@([^/?#]+)/following" example = "https://misskey.io/@USER/following" def items(self): user_id = self.api.user_id_by_username(self.item) for user in self.api.users_following(user_id): user = user["followee"] url = f"{self.root}/@{user['username']}" if (host := user["host"]) is not None: url = f"{url}@{host}" user["_extractor"] = MisskeyUserExtractor yield Message.Queue, url, user class MisskeyNoteExtractor(MisskeyExtractor): """Extractor for images from a Note""" subcategory = "note" pattern = BASE_PATTERN + r"/notes/(\w+)" example = "https://misskey.io/notes/98765" def notes(self): return (self.api.notes_show(self.item),) class MisskeyFavoriteExtractor(MisskeyExtractor): """Extractor for favorited notes""" subcategory = "favorite" pattern = BASE_PATTERN + r"/(?:my|api/i)/favorites" example = "https://misskey.io/my/favorites" def notes(self): return self.api.i_favorites() class MisskeyAPI(): """Interface for Misskey API https://github.com/misskey-dev/misskey https://misskey-hub.net/en/docs/api/ https://misskey-hub.net/docs/api/endpoints.html """ def __init__(self, extractor): self.root = extractor.root self.extractor = extractor self.access_token = extractor.config("access-token") def user_id_by_username(self, username): return self.users_show(username)["id"] def users_following(self, user_id): endpoint = "/users/following" data = {"userId": user_id} return self._pagination(endpoint, data) def users_notes(self, user_id): endpoint = "/users/notes" data = {"userId": user_id} return self._pagination(endpoint, data) @memcache(keyarg=1) def users_show(self, username): endpoint = "/users/show" username, _, host = username.partition("@") data = {"username": username, "host": host or None} return self._call(endpoint, data) def notes_show(self, note_id): endpoint = "/notes/show" data = {"noteId": note_id} return self._call(endpoint, data) def i_favorites(self): endpoint = "/i/favorites" if not self.access_token: raise exception.AuthenticationError() data = {"i": self.access_token} return self._pagination(endpoint, data) def _call(self, endpoint, data): url = f"{self.root}/api{endpoint}" return self.extractor.request_json(url, method="POST", json=data) def _pagination(self, endpoint, data): data["limit"] = 100 while True: notes = self._call(endpoint, data) if not notes: return yield from notes data["untilId"] = notes[-1]["id"] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/moebooru.py0000644000175000017500000001346115040344700020743 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Moebooru based sites""" from .booru import BooruExtractor from .. import text, util import collections import datetime class MoebooruExtractor(BooruExtractor): """Base class for Moebooru extractors""" basecategory = "moebooru" filename_fmt = "{category}_{id}_{md5}.{extension}" page_start = 1 def _prepare(self, post): post["date"] = text.parse_timestamp(post["created_at"]) def _html(self, post): url = f"{self.root}/post/show/{post['id']}" return self.request(url).text def _tags(self, post, page): tag_container = text.extr(page, '")), "genres" : text.split_html(extr( "", "")), "tags" : text.split_html(extr( "", "")), "uploader" : text.remove_html(extr( "", "")), "language" : extr(" ", "\n"), } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/toyhouse.py0000644000175000017500000000766415040344700021003 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://toyhou.se/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?toyhou\.se" class ToyhouseExtractor(Extractor): """Base class for toyhouse extractors""" category = "toyhouse" root = "https://toyhou.se" directory_fmt = ("{category}", "{user|artists!S}") archive_fmt = "{id}" def __init__(self, match): Extractor.__init__(self, match) self.user = match[1] self.offset = 0 def items(self): metadata = self.metadata() for post in util.advance(self.posts(), self.offset): if metadata: post.update(metadata) text.nameext_from_url(post["url"], post) post["id"], _, post["hash"] = post["filename"].partition("_") yield Message.Directory, post yield Message.Url, post["url"], post def posts(self): return () def metadata(self): return None def skip(self, num): self.offset += num return num def _parse_post(self, post, needle='\n
    ', '<'), "%d %b %Y, %I:%M:%S %p"), "artists": [ text.remove_html(artist) for artist in extr( '
    ', '
    \n
    ').split( '
    ') ], "characters": text.split_html(extr( '
    '))[2:], } def _pagination(self, path): url = self.root + path params = {"page": 1} while True: page = self.request(url, params=params).text cnt = 0 for post in text.extract_iter( page, ''): cnt += 1 yield self._parse_post(post) if not cnt and params["page"] == 1: if self._accept_content_warning(page): continue return if cnt < 18: return params["page"] += 1 def _accept_content_warning(self, page): pos = page.find(' name="_token"') + 1 token, pos = text.extract(page, ' value="', '"', pos) user , pos = text.extract(page, ' value="', '"', pos) if not token or not user: return False data = {"_token": token, "user": user} self.request(self.root + "/~account/warnings/accept", method="POST", data=data, allow_redirects=False) return True class ToyhouseArtExtractor(ToyhouseExtractor): """Extractor for artworks of a toyhouse user""" subcategory = "art" pattern = BASE_PATTERN + r"/([^/?#]+)/art" example = "https://www.toyhou.se/USER/art" def posts(self): return self._pagination(f"/{self.user}/art") def metadata(self): return {"user": self.user} class ToyhouseImageExtractor(ToyhouseExtractor): """Extractor for individual toyhouse images""" subcategory = "image" pattern = (r"(?:https?://)?(?:" r"(?:www\.)?toyhou\.se/~images|" r"f\d+\.toyhou\.se/file/[^/?#]+/(?:image|watermark)s" r")/(\d+)") example = "https://toyhou.se/~images/12345" def posts(self): url = f"{self.root}/~images/{self.user}" return (self._parse_post( self.request(url).text, '', '
    ')), "date" : text.parse_datetime( extr('id="Uploaded">', '
    ').strip(), "%Y %B %d"), "rating" : text.parse_float(extr( 'id="Rating">', '').partition(" ")[0]), "type" : text.remove_html(extr('id="Category">' , '')), "collection": text.remove_html(extr('id="Collection">', '')), "group" : text.split_html(extr('id="Group">' , '')), "artist" : text.split_html(extr('id="Artist">' , '')), "parody" : text.split_html(extr('id="Parody">' , '')), "characters": text.split_html(extr('id="Character">' , '')), "tags" : text.split_html(extr('id="Tag">' , '')), "language" : "English", "lang" : "en", } def images(self, page): url = f"{self.root}/Read/Index/{self.gallery_id}?page=1" headers = {"Referer": self.page_url} response = self.request(url, headers=headers, fatal=False) if "/Auth/" in response.url: raise exception.AbortExtraction( f"Failed to get gallery JSON data. Visit '{response.url}' " f"in a browser and solve the CAPTCHA to continue.") page = response.text tpl, pos = text.extract(page, 'data-cdn="', '"') cnt, pos = text.extract(page, '> of ', '<', pos) base, _, params = text.unescape(tpl).partition("[PAGE]") return [ (base + str(i) + params, None) for i in range(1, text.parse_int(cnt)+1) ] class TsuminoSearchExtractor(TsuminoBase, Extractor): """Extractor for search results on tsumino.com""" subcategory = "search" pattern = r"(?i)(?:https?://)?(?:www\.)?tsumino\.com/(?:Books/?)?#(.+)" example = "https://www.tsumino.com/Books#QUERY" def __init__(self, match): Extractor.__init__(self, match) self.query = match[1] def items(self): for gallery in self.galleries(): url = f"{self.root}/entry/{gallery['id']}" gallery["_extractor"] = TsuminoGalleryExtractor yield Message.Queue, url, gallery def galleries(self): """Return all gallery results matching 'self.query'""" url = f"{self.root}/Search/Operate?type=Book" headers = { "Referer": f"{self.root}/", "X-Requested-With": "XMLHttpRequest", } data = { "PageNumber": 1, "Text": "", "Sort": "Newest", "List": "0", "Length": "0", "MinimumRating": "0", "ExcludeList": "0", "CompletelyExcludeHated": "false", } data.update(self._parse(self.query)) while True: info = self.request_json( url, method="POST", headers=headers, data=data) for gallery in info["data"]: yield gallery["entry"] if info["pageNumber"] >= info["pageCount"]: return data["PageNumber"] += 1 def _parse(self, query): if not query: return {} try: if query[0] == "?": return self._parse_simple(query) return self._parse_jsurl(query) except Exception as exc: raise exception.AbortExtraction( f"Invalid search query '{query}' ({exc})") def _parse_simple(self, query): """Parse search query with format '?=value>'""" key, _, value = query.partition("=") tag_types = { "Tag": "1", "Category": "2", "Collection": "3", "Group": "4", "Artist": "5", "Parody": "6", "Character": "7", "Uploader": "100", } return { "Tags[0][Type]": tag_types[key[1:].capitalize()], "Tags[0][Text]": text.unquote(value).replace("+", " "), "Tags[0][Exclude]": "false", } def _parse_jsurl(self, data): """Parse search query in JSURL format Nested lists and dicts are handled in a special way to deal with the way Tsumino expects its parameters -> expand(...) Example: ~(name~'John*20Doe~age~42~children~(~'Mary~'Bill)) Ref: https://github.com/Sage/jsurl """ i = 0 imax = len(data) def eat(expected): nonlocal i if data[i] != expected: raise ValueError( f"bad JSURL syntax: expected '{expected}', got {data[i]}") i += 1 def decode(): nonlocal i beg = i result = "" while i < imax: ch = data[i] if ch not in "~)*!": i += 1 elif ch == "*": if beg < i: result += data[beg:i] if data[i + 1] == "*": result += chr(int(data[i+2:i+6], 16)) i += 6 else: result += chr(int(data[i+1:i+3], 16)) i += 3 beg = i elif ch == "!": if beg < i: result += data[beg:i] result += "$" i += 1 beg = i else: break return result + data[beg:i] def parse_one(): nonlocal i eat('~') result = "" ch = data[i] if ch == "(": i += 1 if data[i] == "~": result = [] if data[i+1] == ")": i += 1 else: result.append(parse_one()) while data[i] == "~": result.append(parse_one()) else: result = {} if data[i] != ")": while True: key = decode() value = parse_one() for ekey, evalue in expand(key, value): result[ekey] = evalue if data[i] != "~": break i += 1 eat(")") elif ch == "'": i += 1 result = decode() else: beg = i i += 1 while i < imax and data[i] not in "~)": i += 1 sub = data[beg:i] if ch in "0123456789-": fval = float(sub) ival = int(fval) result = ival if ival == fval else fval else: if sub not in ("true", "false", "null"): raise ValueError("bad value keyword: " + sub) result = sub return result def expand(key, value): if isinstance(value, list): for index, cvalue in enumerate(value): ckey = f"{key}[{index}]" yield from expand(ckey, cvalue) elif isinstance(value, dict): for ckey, cvalue in value.items(): ckey = f"{key}[{ckey}]" yield from expand(ckey, cvalue) else: yield key, value return parse_one() ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/tumblr.py0000644000175000017500000004747715040344700020437 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2016-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.tumblr.com/""" from .common import Extractor, Message from .. import text, util, oauth, exception from datetime import datetime, date, timedelta BASE_PATTERN = ( r"(?:tumblr:(?:https?://)?([^/]+)|" r"(?:https?://)?" r"(?:(?:www\.)?tumblr\.com/(?:blog/(?:view/)?)?([\w-]+)|" r"([\w-]+\.tumblr\.com)))" ) POST_TYPES = frozenset(("text", "quote", "link", "answer", "video", "audio", "photo", "chat", "search")) class TumblrExtractor(Extractor): """Base class for tumblr extractors""" category = "tumblr" directory_fmt = ("{category}", "{blog_name}") filename_fmt = "{category}_{blog_name}_{id}_{num:>02}.{extension}" archive_fmt = "{id}_{num}" def __init__(self, match): Extractor.__init__(self, match) if name := match[2]: self.blog = name + ".tumblr.com" else: self.blog = match[1] or match[3] def _init(self): self.api = TumblrAPI(self) self.types = self._setup_posttypes() self.avatar = self.config("avatar", False) self.inline = self.config("inline", True) self.reblogs = self.config("reblogs", True) self.external = self.config("external", False) self.original = self.config("original", True) self.fallback_delay = self.config("fallback-delay", 120.0) self.fallback_retries = self.config("fallback-retries", 2) if len(self.types) == 1: self.api.posts_type = next(iter(self.types)) elif not self.types: self.log.warning("no valid post types selected") if self.reblogs == "same-blog": self._skip_reblog = self._skip_reblog_same_blog self.date_min, self.api.before = self._get_date_min_max(0, None) def items(self): blog = None # pre-compile regular expressions self._sub_video = util.re( r"https?://((?:vt|vtt|ve)(?:\.media)?\.tumblr\.com" r"/tumblr_[^_]+)_\d+\.([0-9a-z]+)").sub if self.inline: self._sub_image = util.re( r"https?://(\d+\.media\.tumblr\.com(?:/[0-9a-f]+)?" r"/tumblr(?:_inline)?_[^_]+)_\d+\.([0-9a-z]+)").sub self._subn_orig_image = util.re(r"/s\d+x\d+/").subn _findall_image = util.re(' post["timestamp"]: return if post["type"] not in self.types: continue if "blog" in post: blog = post["blog"] self.blog = blog["name"] + ".tumblr.com" else: if not blog: blog = self.api.info(self.blog) blog["uuid"] = self.blog if self.avatar: url = self.api.avatar(self.blog) yield Message.Directory, {"blog": blog} yield self._prepare_avatar(url, post.copy(), blog) post["blog"] = blog reblog = "reblogged_from_id" in post if reblog and self._skip_reblog(post): continue post["reblogged"] = reblog if "trail" in post: del post["trail"] post["date"] = text.parse_timestamp(post["timestamp"]) posts = [] if "photos" in post: # type "photo" or "link" photos = post["photos"] del post["photos"] for photo in photos: post["photo"] = photo best_photo = photo["original_size"] for alt_photo in photo["alt_sizes"]: if (alt_photo["height"] > best_photo["height"] or alt_photo["width"] > best_photo["width"]): best_photo = alt_photo photo.update(best_photo) if self.original and "/s2048x3072/" in photo["url"] and ( photo["width"] == 2048 or photo["height"] == 3072): photo["url"], fb = self._original_photo(photo["url"]) if fb: post["_fallback"] = self._original_image_fallback( photo["url"], post["id"]) del photo["original_size"] del photo["alt_sizes"] posts.append( self._prepare_image(photo["url"], post.copy())) del post["photo"] post.pop("_fallback", None) url = post.get("audio_url") # type "audio" if url and url.startswith("https://a.tumblr.com/"): posts.append(self._prepare(url, post.copy())) if url := post.get("video_url"): # type "video" posts.append(self._prepare( self._original_video(url), post.copy())) if self.inline and "reblog" in post: # inline media # only "chat" posts are missing a "reblog" key in their # API response, but they can't contain images/videos anyway body = post["reblog"]["comment"] + post["reblog"]["tree_html"] for url in _findall_image(body): url, fb = self._original_inline_image(url) if fb: post["_fallback"] = self._original_image_fallback( url, post["id"]) posts.append(self._prepare_image(url, post.copy())) post.pop("_fallback", None) for url in _findall_video(body): url = self._original_video(url) posts.append(self._prepare(url, post.copy())) if self.external: # external links if url := post.get("permalink_url") or post.get("url"): post["extension"] = None posts.append((Message.Queue, url, post.copy())) del post["extension"] post["count"] = len(posts) yield Message.Directory, post for num, (msg, url, post) in enumerate(posts, 1): post["num"] = num post["count"] = len(posts) yield msg, url, post def posts(self): """Return an iterable containing all relevant posts""" def _setup_posttypes(self): types = self.config("posts", "all") if types == "all": return POST_TYPES elif not types: return frozenset() else: if isinstance(types, str): types = types.split(",") types = frozenset(types) if invalid := types - POST_TYPES: types = types & POST_TYPES self.log.warning("Invalid post types: '%s'", "', '".join(sorted(invalid))) return types def _prepare(self, url, post): text.nameext_from_url(url, post) post["hash"] = post["filename"].partition("_")[2] return Message.Url, url, post def _prepare_image(self, url, post): text.nameext_from_url(url, post) # try ".gifv" (#3095) # it's unknown whether all gifs in this case are actually webps # incorrect extensions will be corrected by 'adjust-extensions' if post["extension"] == "gif": post["_fallback"] = (url + "v",) post["_http_headers"] = {"Accept": # copied from chrome 106 "image/avif,image/webp,image/apng," "image/svg+xml,image/*,*/*;q=0.8"} parts = post["filename"].split("_") try: post["hash"] = parts[1] if parts[1] != "inline" else parts[2] except IndexError: # filename doesn't follow the usual pattern (#129) post["hash"] = post["filename"] return Message.Url, url, post def _prepare_avatar(self, url, post, blog): text.nameext_from_url(url, post) post["num"] = post["count"] = 1 post["blog"] = blog post["reblogged"] = False post["type"] = post["id"] = post["hash"] = "avatar" return Message.Url, url, post def _skip_reblog(self, _): return not self.reblogs def _skip_reblog_same_blog(self, post): return self.blog != post.get("reblogged_root_uuid") def _original_photo(self, url): resized = url.replace("/s2048x3072/", "/s99999x99999/", 1) return self._update_image_token(resized) def _original_inline_image(self, url): if self.original: resized, n = self._subn_orig_image("/s99999x99999/", url, 1) if n: return self._update_image_token(resized) return self._sub_image(r"https://\1_1280.\2", url), False def _original_video(self, url): return self._sub_video(r"https://\1.\2", url) def _update_image_token(self, resized): headers = {"Accept": "text/html,*/*;q=0.8"} try: response = self.request(resized, headers=headers) except Exception: return resized, True else: updated = text.extr(response.text, '" src="', '"') return updated, (resized == updated) def _original_image_fallback(self, url, post_id): for _ in util.repeat(self.fallback_retries): self.sleep(self.fallback_delay, "image token") yield self._update_image_token(url)[0] self.log.warning("Unable to fetch higher-resolution " "version of %s (%s)", url, post_id) class TumblrUserExtractor(TumblrExtractor): """Extractor for a Tumblr user's posts""" subcategory = "user" pattern = BASE_PATTERN + r"(?:/page/\d+|/archive)?/?$" example = "https://www.tumblr.com/BLOG" def posts(self): return self.api.posts(self.blog, {}) class TumblrPostExtractor(TumblrExtractor): """Extractor for a single Tumblr post""" subcategory = "post" pattern = BASE_PATTERN + r"/(?:post/|image/)?(\d+)" example = "https://www.tumblr.com/BLOG/12345" def __init__(self, match): TumblrExtractor.__init__(self, match) self.post_id = match[4] self.reblogs = True self.date_min = 0 def posts(self): return self.api.posts(self.blog, {"id": self.post_id}) def _setup_posttypes(self): return POST_TYPES class TumblrTagExtractor(TumblrExtractor): """Extractor for Tumblr user's posts by tag""" subcategory = "tag" pattern = BASE_PATTERN + r"/tagged/([^/?#]+)" example = "https://www.tumblr.com/BLOG/tagged/TAG" def __init__(self, match): TumblrExtractor.__init__(self, match) self.tag = text.unquote(match[4].replace("-", " ")) def posts(self): return self.api.posts(self.blog, {"tag": self.tag}) class TumblrDayExtractor(TumblrExtractor): """Extractor for Tumblr user's posts by day""" subcategory = "day" pattern = BASE_PATTERN + r"/day/(\d\d\d\d/\d\d/\d\d)" example = "https://www.tumblr.com/BLOG/day/1970/01/01" def __init__(self, match): TumblrExtractor.__init__(self, match) year, month, day = match[4].split("/") self.ordinal = date(int(year), int(month), int(day)).toordinal() def _init(self): TumblrExtractor._init(self) self.date_min = ( # 719163 == date(1970, 1, 1).toordinal() (self.ordinal - 719163) * 86400) self.api.before = self.date_min + 86400 def posts(self): return self.api.posts(self.blog, {}) class TumblrLikesExtractor(TumblrExtractor): """Extractor for a Tumblr user's liked posts""" subcategory = "likes" directory_fmt = ("{category}", "{blog_name}", "likes") archive_fmt = "f_{blog[name]}_{id}_{num}" pattern = BASE_PATTERN + r"/likes" example = "https://www.tumblr.com/BLOG/likes" def posts(self): return self.api.likes(self.blog) class TumblrSearchExtractor(TumblrExtractor): """Extractor for a Tumblr search""" subcategory = "search" pattern = (r"(?:https?://)?(?:www\.)?tumblr\.com/search/([^/?#]+)" r"(?:/([^/?#]+)(?:/([^/?#]+))?)?(?:/?\?([^#]+))?") example = "https://www.tumblr.com/search/QUERY" def posts(self): _, _, _, search, mode, post_type, query = self.groups params = text.parse_query(query) return self.api.search(text.unquote(search), params, mode, post_type) class TumblrAPI(oauth.OAuth1API): """Interface for the Tumblr API v2 https://github.com/tumblr/docs/blob/master/api.md """ ROOT = "https://api.tumblr.com" API_KEY = "O3hU2tMi5e4Qs5t3vezEi6L0qRORJ5y9oUpSGsrWu8iA3UCc3B" API_SECRET = "sFdsK3PDdP2QpYMRAoq0oDnw0sFS24XigXmdfnaeNZpJpqAn03" BLOG_CACHE = {} def __init__(self, extractor): oauth.OAuth1API.__init__(self, extractor) self.posts_type = self.before = None def info(self, blog): """Return general information about a blog""" try: return self.BLOG_CACHE[blog] except KeyError: endpoint = f"/v2/blog/{blog}/info" params = {"api_key": self.api_key} if self.api_key else None self.BLOG_CACHE[blog] = blog = self._call(endpoint, params)["blog"] return blog def avatar(self, blog, size="512"): """Retrieve a blog avatar""" if self.api_key: return (f"{self.ROOT}/v2/blog/{blog}/avatar/{size}" f"?api_key={self.api_key}") endpoint = f"/v2/blog/{blog}/avatar" params = {"size": size} return self._call( endpoint, params, allow_redirects=False)["avatar_url"] def posts(self, blog, params): """Retrieve published posts""" params["offset"] = self.extractor.config("offset") params["limit"] = 50 params["reblog_info"] = "true" params["type"] = self.posts_type params["before"] = self.before if self.before and params["offset"]: self.log.warning("'offset' and 'date-max' cannot be used together") endpoint = f"/v2/blog/{blog}/posts" return self._pagination(endpoint, params, blog=blog, cache=True) def likes(self, blog): """Retrieve liked posts""" endpoint = f"/v2/blog/{blog}/likes" params = {"limit": "50", "before": self.before} if self.api_key: params["api_key"] = self.api_key while True: posts = self._call(endpoint, params)["liked_posts"] if not posts: return yield from posts params["before"] = posts[-1]["liked_timestamp"] def search(self, query, params, mode="top", post_type=None): """Retrieve search results""" endpoint = "/v2/timeline/search" params["limit"] = "50" params["days"] = params.pop("t", None) params["query"] = query params["mode"] = mode params["reblog_info"] = "true" if self.extractor.reblogs else "false" if post_type: params["post_type_filter"] = post_type return self._pagination(endpoint, params) def _call(self, endpoint, params, **kwargs): url = self.ROOT + endpoint kwargs["params"] = params while True: response = self.request(url, **kwargs) try: data = response.json() except ValueError: data = response.text status = response.status_code else: status = data["meta"]["status"] if 200 <= status < 400: return data["response"] self.log.debug(data) if status == 403: raise exception.AuthorizationError() elif status == 404: try: error = data["errors"][0]["detail"] board = ("only viewable within the Tumblr dashboard" in error) except Exception: board = False if board: if self.api_key is None: self.log.info( "Ensure your 'access-token' and " "'access-token-secret' belong to the same " "application as 'api-key' and 'api-secret'") else: self.log.info("Run 'gallery-dl oauth:tumblr' " "to access dashboard-only blogs") raise exception.AuthorizationError(error) raise exception.NotFoundError("user or post") elif status == 429: # daily rate limit if response.headers.get("x-ratelimit-perday-remaining") == "0": self.log.info("Daily API rate limit exceeded") reset = response.headers.get("x-ratelimit-perday-reset") api_key = self.api_key or self.session.auth.consumer_key if api_key == self.API_KEY: self.log.info( "Register your own OAuth application and use its " "credentials to prevent this error: " "https://gdl-org.github.io/docs/configuration.html" "#extractor-tumblr-api-key-api-secret") if self.extractor.config("ratelimit") == "wait": self.extractor.wait(seconds=reset) continue t = (datetime.now() + timedelta(0, float(reset))).time() raise exception.AbortExtraction( f"Aborting - Rate limit will reset at " f"{t.hour:02}:{t.minute:02}:{t.second:02}") # hourly rate limit if reset := response.headers.get("x-ratelimit-perhour-reset"): self.log.info("Hourly API rate limit exceeded") self.extractor.wait(seconds=reset) continue raise exception.AbortExtraction(data) def _pagination(self, endpoint, params, blog=None, key="posts", cache=False): if self.api_key: params["api_key"] = self.api_key strategy = self.extractor.config("pagination") if not strategy and "offset" not in params: strategy = "api" while True: data = self._call(endpoint, params) if "timeline" in data: data = data["timeline"] posts = data["elements"] else: if cache: self.BLOG_CACHE[blog] = data["blog"] cache = False posts = data[key] yield from posts if strategy == "api": try: endpoint = data["_links"]["next"]["href"] except KeyError: return params = None if self.api_key: endpoint += "&api_key=" + self.api_key elif strategy == "before": if not posts: return timestamp = posts[-1]["timestamp"] + 1 if params["before"] and timestamp >= params["before"]: return params["before"] = timestamp params["offset"] = None else: # offset params["offset"] = \ text.parse_int(params["offset"]) + params["limit"] params["before"] = None if params["offset"] >= data["total_posts"]: return ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/tumblrgallery.py0000644000175000017500000001064515040344700022002 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://tumblrgallery.xyz/""" from .common import GalleryExtractor from .. import text BASE_PATTERN = r"(?:https?://)?tumblrgallery\.xyz" class TumblrgalleryExtractor(GalleryExtractor): """Base class for tumblrgallery extractors""" category = "tumblrgallery" filename_fmt = "{category}_{gallery_id}_{num:>03}_{id}.{extension}" directory_fmt = ("{category}", "{gallery_id} {title}") root = "https://tumblrgallery.xyz" referer = False def _urls_from_page(self, page): return text.extract_iter( page, '
    ", "")), "gallery_id": self.gallery_id, } def images(self, _): base = f"{self.root}/tumblrblog/gallery/{self.gallery_id}/" pnum = 1 while True: url = f"{base}{pnum}.html" response = self.request(url, allow_redirects=False, fatal=False) if response.status_code >= 300: return for url in self._urls_from_page(response.text): yield url, self._data_from_url(url) pnum += 1 class TumblrgalleryPostExtractor(TumblrgalleryExtractor): """Extractor for Posts on tumblrgallery.xyz""" subcategory = "post" pattern = BASE_PATTERN + r"(/post/(\d+)\.html)" example = "https://tumblrgallery.xyz/post/12345.html" def __init__(self, match): TumblrgalleryExtractor.__init__(self, match) self.gallery_id = text.parse_int(match[2]) def metadata(self, page): return { "title" : text.remove_html( text.unescape(text.extr(page, "", "")) ).replace("_", "-"), "gallery_id": self.gallery_id, } def images(self, page): for url in self._urls_from_page(page): yield url, self._data_from_url(url) class TumblrgallerySearchExtractor(TumblrgalleryExtractor): """Extractor for Search result on tumblrgallery.xyz""" subcategory = "search" filename_fmt = "{category}_{num:>03}_{gallery_id}_{id}_{title}.{extension}" directory_fmt = ("{category}", "{search_term}") pattern = BASE_PATTERN + r"(/s\.php\?q=([^&#]+))" example = "https://tumblrgallery.xyz/s.php?q=QUERY" def __init__(self, match): TumblrgalleryExtractor.__init__(self, match) self.search_term = match[2] def metadata(self, page): return { "search_term": self.search_term, } def images(self, _): page_url = "s.php?q=" + self.search_term while True: page = self.request(self.root + "/" + page_url).text for gallery_id in text.extract_iter( page, '
    ", "") )).replace("_", "-") yield url, data next_url = text.extr( page, ' = 400: continue if url := text.extr( response.text, 'name="twitter:image" value="', '"'): files.append({"url": url}) def _transform_tweet(self, tweet): if "author" in tweet: author = tweet["author"] elif "core" in tweet: author = tweet["core"]["user_results"]["result"] else: author = tweet["user"] author = self._transform_user(author) if "legacy" in tweet: legacy = tweet["legacy"] else: legacy = tweet tget = legacy.get tweet_id = int(legacy["id_str"]) if tweet_id >= 300000000000000: date = text.parse_timestamp( ((tweet_id >> 22) + 1288834974657) // 1000) else: try: date = text.parse_datetime( legacy["created_at"], "%a %b %d %H:%M:%S %z %Y") except Exception: date = util.NONE source = tweet.get("source") tdata = { "tweet_id" : tweet_id, "retweet_id" : text.parse_int( tget("retweeted_status_id_str")), "quote_id" : text.parse_int( tget("quoted_by_id_str")), "reply_id" : text.parse_int( tget("in_reply_to_status_id_str")), "conversation_id": text.parse_int( tget("conversation_id_str")), "source_id" : 0, "date" : date, "author" : author, "user" : self._user or author, "lang" : legacy["lang"], "source" : text.extr(source, ">", "<") if source else "", "sensitive" : tget("possibly_sensitive"), "sensitive_flags": tget("sensitive_flags"), "favorite_count": tget("favorite_count"), "quote_count" : tget("quote_count"), "reply_count" : tget("reply_count"), "retweet_count" : tget("retweet_count"), "bookmark_count": tget("bookmark_count"), } if "views" in tweet: try: tdata["view_count"] = int(tweet["views"]["count"]) except Exception: tdata["view_count"] = 0 else: tdata["view_count"] = 0 if "note_tweet" in tweet: note = tweet["note_tweet"]["note_tweet_results"]["result"] content = note["text"] entities = note["entity_set"] else: content = tget("full_text") or tget("text") or "" entities = legacy["entities"] if hashtags := entities.get("hashtags"): tdata["hashtags"] = [t["text"] for t in hashtags] if mentions := entities.get("user_mentions"): tdata["mentions"] = [{ "id": text.parse_int(u["id_str"]), "name": u["screen_name"], "nick": u["name"], } for u in mentions] content = text.unescape(content) if urls := entities.get("urls"): for url in urls: try: content = content.replace(url["url"], url["expanded_url"]) except KeyError: pass txt, _, tco = content.rpartition(" ") tdata["content"] = txt if tco.startswith("https://t.co/") else content if "birdwatch_pivot" in tweet: try: tdata["birdwatch"] = \ tweet["birdwatch_pivot"]["subtitle"]["text"] except KeyError: self.log.debug("Unable to extract 'birdwatch' note from %s", tweet["birdwatch_pivot"]) if "in_reply_to_screen_name" in legacy: tdata["reply_to"] = legacy["in_reply_to_screen_name"] if "quoted_by" in legacy: tdata["quote_by"] = legacy["quoted_by"] if "extended_entities" in legacy: self._extract_media_source( tdata, legacy["extended_entities"]["media"][0]) if tdata["retweet_id"]: tdata["content"] = f"RT @{author['name']}: {tdata['content']}" tdata["date_original"] = text.parse_timestamp( ((tdata["retweet_id"] >> 22) + 1288834974657) // 1000) return tdata def _transform_user(self, user): try: uid = user.get("rest_id") or user["id_str"] except KeyError: # private/invalid user (#4349) return {} try: return self._user_cache[uid] except KeyError: pass if "legacy" in user: user = user["legacy"] uget = user.get if uget("withheld_scope"): self.log.warning("'%s'", uget("description")) entities = user["entities"] self._user_cache[uid] = udata = { "id" : text.parse_int(uid), "name" : user["screen_name"], "nick" : user["name"], "location" : uget("location"), "date" : text.parse_datetime( uget("created_at"), "%a %b %d %H:%M:%S %z %Y"), "verified" : uget("verified", False), "protected" : uget("protected", False), "profile_banner" : uget("profile_banner_url", ""), "profile_image" : uget( "profile_image_url_https", "").replace("_normal.", "."), "favourites_count": uget("favourites_count"), "followers_count" : uget("followers_count"), "friends_count" : uget("friends_count"), "listed_count" : uget("listed_count"), "media_count" : uget("media_count"), "statuses_count" : uget("statuses_count"), } descr = user["description"] if urls := entities["description"].get("urls"): for url in urls: try: descr = descr.replace(url["url"], url["expanded_url"]) except KeyError: pass udata["description"] = descr if "url" in entities: url = entities["url"]["urls"][0] udata["url"] = url.get("expanded_url") or url.get("url") return udata def _assign_user(self, user): self._user_obj = user self._user = self._transform_user(user) def _users_result(self, users): userfmt = self.config("users") if not userfmt or userfmt == "user": cls = TwitterUserExtractor fmt = (self.root + "/i/user/{rest_id}").format_map elif userfmt == "timeline": cls = TwitterTimelineExtractor fmt = (self.root + "/id:{rest_id}/timeline").format_map elif userfmt == "media": cls = TwitterMediaExtractor fmt = (self.root + "/id:{rest_id}/media").format_map elif userfmt == "tweets": cls = TwitterTweetsExtractor fmt = (self.root + "/id:{rest_id}/tweets").format_map else: cls = None fmt = userfmt.format_map for user in users: user["_extractor"] = cls yield Message.Queue, fmt(user), user def _expand_tweets(self, tweets): seen = set() for tweet in tweets: obj = tweet["legacy"] if "legacy" in tweet else tweet cid = obj.get("conversation_id_str") if not cid: tid = obj["id_str"] self.log.warning( "Unable to expand %s (no 'conversation_id')", tid) continue if cid in seen: self.log.debug( "Skipping expansion of %s (previously seen)", cid) continue seen.add(cid) try: yield from self.api.tweet_detail(cid) except Exception: yield tweet def _make_tweet(self, user, url, id_str): return { "id_str": id_str, "lang": None, "user": user, "source": "><", "entities": {}, "extended_entities": { "media": [ { "original_info": {}, "media_url": url, }, ], }, } def _init_cursor(self): cursor = self.config("cursor", True) if not cursor: self._update_cursor = util.identity elif isinstance(cursor, str): return cursor def _update_cursor(self, cursor): self.log.debug("Cursor: %s", cursor) self._cursor = cursor return cursor def metadata(self): """Return general metadata""" return {} def tweets(self): """Yield all relevant tweet objects""" def finalize(self): if self._cursor: self.log.info("Use '-o cursor=%s' to continue downloading " "from the current position", self._cursor) def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(_login_impl(self, username, password)) class TwitterUserExtractor(Dispatch, TwitterExtractor): """Extractor for a Twitter user""" pattern = (BASE_PATTERN + r"/(?!search)(?:([^/?#]+)/?(?:$|[?#])" r"|i(?:/user/|ntent/user\?user_id=)(\d+))") example = "https://x.com/USER" def items(self): user, user_id = self.groups if user_id is not None: user = "id:" + user_id base = f"{self.root}/{user}/" return self._dispatch_extractors(( (TwitterInfoExtractor , base + "info"), (TwitterAvatarExtractor , base + "photo"), (TwitterBackgroundExtractor, base + "header_photo"), (TwitterTimelineExtractor , base + "timeline"), (TwitterTweetsExtractor , base + "tweets"), (TwitterMediaExtractor , base + "media"), (TwitterRepliesExtractor , base + "with_replies"), (TwitterLikesExtractor , base + "likes"), ), ("timeline",)) class TwitterTimelineExtractor(TwitterExtractor): """Extractor for a Twitter user timeline""" subcategory = "timeline" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/timeline(?!\w)" example = "https://x.com/USER/timeline" def _init_cursor(self): if self._cursor: return self._cursor.partition("/")[2] or None return None def _update_cursor(self, cursor): if cursor: self._cursor = self._cursor_prefix + cursor self.log.debug("Cursor: %s", self._cursor) else: self._cursor = None return cursor def tweets(self): reset = False cursor = self.config("cursor", True) if not cursor: self._update_cursor = util.identity elif isinstance(cursor, str): self._cursor = cursor else: cursor = None if cursor: state = cursor.partition("/")[0] state, _, tweet_id = state.partition("_") state = text.parse_int(state, 1) else: state = 1 if state <= 1: self._cursor_prefix = "1/" # yield initial batch of (media) tweets tweet = None for tweet in self._select_tweet_source()(self.user): yield tweet if tweet is None and not cursor: return tweet_id = tweet["rest_id"] state = reset = 2 else: self.api._user_id_by_screen_name(self.user) # build search query query = f"from:{self._user['name']} max_id:{tweet_id}" if self.retweets: query += " include:retweets include:nativeretweets" if state <= 2: self._cursor_prefix = f"2_{tweet_id}/" if reset: self._cursor = self._cursor_prefix if not self.textonly: # try to search for media-only tweets tweet = None for tweet in self.api.search_timeline(query + " filter:links"): yield tweet if tweet is not None: return self._update_cursor(None) state = reset = 3 if state <= 3: # yield unfiltered search results self._cursor_prefix = f"3_{tweet_id}/" if reset: self._cursor = self._cursor_prefix yield from self.api.search_timeline(query) return self._update_cursor(None) def _select_tweet_source(self): strategy = self.config("strategy") if strategy is None or strategy == "auto": if self.retweets or self.textonly: return self.api.user_tweets else: return self.api.user_media if strategy == "tweets": return self.api.user_tweets if strategy == "media": return self.api.user_media if strategy == "with_replies": return self.api.user_tweets_and_replies raise exception.AbortExtraction(f"Invalid strategy '{strategy}'") class TwitterTweetsExtractor(TwitterExtractor): """Extractor for Tweets from a user's Tweets timeline""" subcategory = "tweets" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/tweets(?!\w)" example = "https://x.com/USER/tweets" def tweets(self): return self.api.user_tweets(self.user) class TwitterRepliesExtractor(TwitterExtractor): """Extractor for Tweets from a user's timeline including replies""" subcategory = "replies" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/with_replies(?!\w)" example = "https://x.com/USER/with_replies" def tweets(self): return self.api.user_tweets_and_replies(self.user) class TwitterMediaExtractor(TwitterExtractor): """Extractor for Tweets from a user's Media timeline""" subcategory = "media" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/media(?!\w)" example = "https://x.com/USER/media" def tweets(self): return self.api.user_media(self.user) class TwitterLikesExtractor(TwitterExtractor): """Extractor for liked tweets""" subcategory = "likes" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/likes(?!\w)" example = "https://x.com/USER/likes" def metadata(self): return {"user_likes": self.user} def tweets(self): return self.api.user_likes(self.user) class TwitterBookmarkExtractor(TwitterExtractor): """Extractor for bookmarked tweets""" subcategory = "bookmark" pattern = BASE_PATTERN + r"/i/bookmarks()" example = "https://x.com/i/bookmarks" def tweets(self): return self.api.user_bookmarks() def _transform_tweet(self, tweet): tdata = TwitterExtractor._transform_tweet(self, tweet) tdata["date_bookmarked"] = text.parse_timestamp( (int(tweet["sortIndex"] or 0) >> 20) // 1000) return tdata class TwitterListExtractor(TwitterExtractor): """Extractor for Twitter lists""" subcategory = "list" pattern = BASE_PATTERN + r"/i/lists/(\d+)/?$" example = "https://x.com/i/lists/12345" def tweets(self): return self.api.list_latest_tweets_timeline(self.user) class TwitterListMembersExtractor(TwitterExtractor): """Extractor for members of a Twitter list""" subcategory = "list-members" pattern = BASE_PATTERN + r"/i/lists/(\d+)/members" example = "https://x.com/i/lists/12345/members" def items(self): self.login() return self._users_result(TwitterAPI(self).list_members(self.user)) class TwitterFollowingExtractor(TwitterExtractor): """Extractor for followed users""" subcategory = "following" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/following(?!\w)" example = "https://x.com/USER/following" def items(self): self.login() return self._users_result(TwitterAPI(self).user_following(self.user)) class TwitterFollowersExtractor(TwitterExtractor): """Extractor for a user's followers""" subcategory = "followers" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/followers(?!\w)" example = "https://x.com/USER/followers" def items(self): self.login() return self._users_result(TwitterAPI(self).user_followers(self.user)) class TwitterSearchExtractor(TwitterExtractor): """Extractor for Twitter search results""" subcategory = "search" pattern = BASE_PATTERN + r"/search/?\?(?:[^&#]+&)*q=([^&#]+)" example = "https://x.com/search?q=QUERY" def metadata(self): return {"search": text.unquote(self.user)} def tweets(self): query = text.unquote(self.user.replace("+", " ")) user = None for item in query.split(): item = item.strip("()") if item.startswith("from:"): if user: user = None break else: user = item[5:] if user is not None: try: self._assign_user(self.api.user_by_screen_name(user)) except KeyError: pass return self.api.search_timeline(query) class TwitterHashtagExtractor(TwitterExtractor): """Extractor for Twitter hashtags""" subcategory = "hashtag" pattern = BASE_PATTERN + r"/hashtag/([^/?#]+)" example = "https://x.com/hashtag/NAME" def items(self): url = f"{self.root}/search?q=%23{self.user}" data = {"_extractor": TwitterSearchExtractor} yield Message.Queue, url, data class TwitterCommunityExtractor(TwitterExtractor): """Extractor for a Twitter community""" subcategory = "community" pattern = BASE_PATTERN + r"/i/communities/(\d+)" example = "https://x.com/i/communities/12345" def tweets(self): if self.textonly: return self.api.community_tweets_timeline(self.user) return self.api.community_media_timeline(self.user) class TwitterCommunitiesExtractor(TwitterExtractor): """Extractor for followed Twitter communities""" subcategory = "communities" pattern = BASE_PATTERN + r"/([^/?#]+)/communities/?$" example = "https://x.com/i/communities" def tweets(self): return self.api.communities_main_page_timeline(self.user) class TwitterEventExtractor(TwitterExtractor): """Extractor for Tweets from a Twitter Event""" subcategory = "event" directory_fmt = ("{category}", "Events", "{event[id]} {event[short_title]}") pattern = BASE_PATTERN + r"/i/events/(\d+)" example = "https://x.com/i/events/12345" def metadata(self): return {"event": self.api.live_event(self.user)} def tweets(self): return self.api.live_event_timeline(self.user) class TwitterTweetExtractor(TwitterExtractor): """Extractor for individual tweets""" subcategory = "tweet" pattern = (BASE_PATTERN + r"/([^/?#]+|i/web)/status/(\d+)" r"/?(?:$|\?|#|photo/|video/)") example = "https://x.com/USER/status/12345" def __init__(self, match): TwitterExtractor.__init__(self, match) self.tweet_id = match[2] def tweets(self): if conversations := self.config("conversations"): self._accessible = (conversations == "accessible") return self._tweets_conversation(self.tweet_id) endpoint = self.config("tweet-endpoint") if endpoint == "detail" or endpoint in (None, "auto") and \ self.api.headers["x-twitter-auth-type"]: return self._tweets_detail(self.tweet_id) return self._tweets_single(self.tweet_id) def _tweets_single(self, tweet_id): tweet = self.api.tweet_result_by_rest_id(tweet_id) try: self._assign_user(tweet["core"]["user_results"]["result"]) except KeyError: raise exception.AbortExtraction( f"'{tweet.get('reason') or 'Unavailable'}'") yield tweet if not self.quoted: return while True: tweet_id = tweet["legacy"].get("quoted_status_id_str") if not tweet_id: break tweet = self.api.tweet_result_by_rest_id(tweet_id) tweet["legacy"]["quoted_by_id_str"] = tweet_id yield tweet def _tweets_detail(self, tweet_id): tweets = [] for tweet in self.api.tweet_detail(tweet_id): if tweet["rest_id"] == tweet_id or \ tweet.get("_retweet_id_str") == tweet_id: if self._user_obj is None: self._assign_user(tweet["core"]["user_results"]["result"]) tweets.append(tweet) tweet_id = tweet["legacy"].get("quoted_status_id_str") if not tweet_id: break return tweets def _tweets_conversation(self, tweet_id): tweets = self.api.tweet_detail(tweet_id) buffer = [] for tweet in tweets: buffer.append(tweet) if tweet["rest_id"] == tweet_id or \ tweet.get("_retweet_id_str") == tweet_id: self._assign_user(tweet["core"]["user_results"]["result"]) break else: # initial Tweet not accessible if self._accessible: return () return buffer return itertools.chain(buffer, tweets) class TwitterQuotesExtractor(TwitterExtractor): """Extractor for quotes of a Tweet""" subcategory = "quotes" pattern = BASE_PATTERN + r"/(?:[^/?#]+|i/web)/status/(\d+)/quotes" example = "https://x.com/USER/status/12345/quotes" def items(self): url = f"{self.root}/search?q=quoted_tweet_id:{self.user}" data = {"_extractor": TwitterSearchExtractor} yield Message.Queue, url, data class TwitterInfoExtractor(TwitterExtractor): """Extractor for a user's profile data""" subcategory = "info" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/info" example = "https://x.com/USER/info" def items(self): api = TwitterAPI(self) screen_name = self.user if screen_name.startswith("id:"): user = api.user_by_rest_id(screen_name[3:]) else: user = api.user_by_screen_name(screen_name) return iter(((Message.Directory, self._transform_user(user)),)) class TwitterAvatarExtractor(TwitterExtractor): subcategory = "avatar" filename_fmt = "avatar {date}.{extension}" archive_fmt = "AV_{user[id]}_{date}" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/photo" example = "https://x.com/USER/photo" def tweets(self): self.api._user_id_by_screen_name(self.user) user = self._user_obj url = user["legacy"]["profile_image_url_https"] if url == ("https://abs.twimg.com/sticky" "/default_profile_images/default_profile_normal.png"): return () url = url.replace("_normal.", ".") id_str = url.rsplit("/", 2)[1] return (self._make_tweet(user, url, id_str),) class TwitterBackgroundExtractor(TwitterExtractor): subcategory = "background" filename_fmt = "background {date}.{extension}" archive_fmt = "BG_{user[id]}_{date}" pattern = BASE_PATTERN + r"/(?!search)([^/?#]+)/header_photo" example = "https://x.com/USER/header_photo" def tweets(self): self.api._user_id_by_screen_name(self.user) user = self._user_obj try: url = user["legacy"]["profile_banner_url"] _, timestamp = url.rsplit("/", 1) except (KeyError, ValueError): return () id_str = str((int(timestamp) * 1000 - 1288834974657) << 22) return (self._make_tweet(user, url, id_str),) class TwitterImageExtractor(Extractor): category = "twitter" subcategory = "image" pattern = r"https?://pbs\.twimg\.com/media/([\w-]+)(?:\?format=|\.)(\w+)" example = "https://pbs.twimg.com/media/ABCDE?format=jpg&name=orig" def __init__(self, match): Extractor.__init__(self, match) self.id, self.fmt = match.groups() TwitterExtractor._init_sizes(self) def items(self): base = f"https://pbs.twimg.com/media/{self.id}?format={self.fmt}&name=" data = { "filename": self.id, "extension": self.fmt, "_fallback": TwitterExtractor._image_fallback(self, base), } yield Message.Directory, data yield Message.Url, base + self._size_image, data class TwitterAPI(): client_transaction = None def __init__(self, extractor): self.extractor = extractor self.log = extractor.log self.root = "https://x.com/i/api" self._nsfw_warning = True self._json_dumps = util.json_dumps cookies = extractor.cookies cookies_domain = extractor.cookies_domain csrf = extractor.config("csrf") if csrf is None or csrf == "cookies": csrf_token = cookies.get("ct0", domain=cookies_domain) else: csrf_token = None if not csrf_token: csrf_token = util.generate_token() cookies.set("ct0", csrf_token, domain=cookies_domain) auth_token = cookies.get("auth_token", domain=cookies_domain) self.headers = { "Accept": "*/*", "Referer": extractor.root + "/", "content-type": "application/json", "x-guest-token": None, "x-twitter-auth-type": "OAuth2Session" if auth_token else None, "x-csrf-token": csrf_token, "x-twitter-client-language": "en", "x-twitter-active-user": "yes", "x-client-transaction-id": None, "Sec-Fetch-Dest": "empty", "Sec-Fetch-Mode": "cors", "Sec-Fetch-Site": "same-origin", "authorization": "Bearer AAAAAAAAAAAAAAAAAAAAANRILgAAAAAAnNwIzUejR" "COuH5E6I8xnZz4puTs%3D1Zv7ttfk8LF81IUq16cHjhLTvJu" "4FA33AGWWjCpTnA", } self.params = { "include_profile_interstitial_type": "1", "include_blocking": "1", "include_blocked_by": "1", "include_followed_by": "1", "include_want_retweets": "1", "include_mute_edge": "1", "include_can_dm": "1", "include_can_media_tag": "1", "include_ext_has_nft_avatar": "1", "include_ext_is_blue_verified": "1", "include_ext_verified_type": "1", "skip_status": "1", "cards_platform": "Web-12", "include_cards": "1", "include_ext_alt_text": "true", "include_ext_limited_action_results": "false", "include_quote_count": "true", "include_reply_count": "1", "tweet_mode": "extended", "include_ext_collab_control": "true", "include_ext_views": "true", "include_entities": "true", "include_user_entities": "true", "include_ext_media_color": "true", "include_ext_media_availability": "true", "include_ext_sensitive_media_warning": "true", "include_ext_trusted_friends_metadata": "true", "send_error_codes": "true", "simple_quoted_tweet": "true", "q": None, "count": "100", "query_source": None, "cursor": None, "pc": None, "spelling_corrections": None, "include_ext_edit_control": "true", "ext": "mediaStats,highlightedLabel,hasNftAvatar,voiceInfo," "enrichments,superFollowMetadata,unmentionInfo,editControl," "collab_control,vibe", } self.features = { "hidden_profile_subscriptions_enabled": True, "profile_label_improvements_pcf_label_in_post_enabled": True, "rweb_tipjar_consumption_enabled": True, "responsive_web_graphql_exclude_directive_enabled": True, "verified_phone_label_enabled": False, "highlights_tweets_tab_ui_enabled": True, "responsive_web_twitter_article_notes_tab_enabled": True, "subscriptions_feature_can_gift_premium": True, "creator_subscriptions_tweet_preview_api_enabled": True, "responsive_web_graphql_" "skip_user_profile_image_extensions_enabled": False, "responsive_web_graphql_" "timeline_navigation_enabled": True, } self.features_pagination = { "rweb_video_screen_enabled": False, "profile_label_improvements_pcf_label_in_post_enabled": True, "rweb_tipjar_consumption_enabled": True, "responsive_web_graphql_exclude_directive_enabled": True, "verified_phone_label_enabled": False, "creator_subscriptions_tweet_preview_api_enabled": True, "responsive_web_graphql_" "timeline_navigation_enabled": True, "responsive_web_graphql_" "skip_user_profile_image_extensions_enabled": False, "premium_content_api_read_enabled": False, "communities_web_enable_tweet_community_results_fetch": True, "c9s_tweet_anatomy_moderator_badge_enabled": True, "responsive_web_grok_analyze_button_fetch_trends_enabled": False, "responsive_web_grok_analyze_post_followups_enabled": True, "responsive_web_jetfuel_frame": False, "responsive_web_grok_share_attachment_enabled": True, "articles_preview_enabled": True, "responsive_web_edit_tweet_api_enabled": True, "graphql_is_translatable_rweb_tweet_is_translatable_enabled": True, "view_counts_everywhere_api_enabled": True, "longform_notetweets_consumption_enabled": True, "responsive_web_twitter_article_tweet_consumption_enabled": True, "tweet_awards_web_tipping_enabled": False, "responsive_web_grok_show_grok_translated_post": False, "responsive_web_grok_analysis_button_from_backend": True, "creator_subscriptions_quote_tweet_preview_enabled": False, "freedom_of_speech_not_reach_fetch_enabled": True, "standardized_nudges_misinfo": True, "tweet_with_visibility_results_" "prefer_gql_limited_actions_policy_enabled": True, "longform_notetweets_rich_text_read_enabled": True, "longform_notetweets_inline_media_enabled": True, "responsive_web_grok_image_annotation_enabled": True, "responsive_web_enhance_cards_enabled": False, } def tweet_result_by_rest_id(self, tweet_id): endpoint = "/graphql/Vg2Akr5FzUmF0sTplA5k6g/TweetResultByRestId" variables = { "tweetId": tweet_id, "withCommunity": False, "includePromotedContent": False, "withVoice": False, } field_toggles = { "withArticleRichContentState": True, "withArticlePlainText": False, "withGrokAnalyze": False, "withDisallowedReplyControls": False, } params = { "variables" : self._json_dumps(variables), "features" : self._json_dumps(self.features_pagination), "fieldToggles": self._json_dumps(field_toggles), } tweet = self._call(endpoint, params)["data"]["tweetResult"]["result"] if "tweet" in tweet: tweet = tweet["tweet"] if tweet.get("__typename") == "TweetUnavailable": reason = tweet.get("reason") if reason == "NsfwLoggedOut": raise exception.AuthorizationError("NSFW Tweet") if reason == "Protected": raise exception.AuthorizationError("Protected Tweet") raise exception.AbortExtraction(f"Tweet unavailable ('{reason}')") return tweet def tweet_detail(self, tweet_id): endpoint = "/graphql/b9Yw90FMr_zUb8DvA8r2ug/TweetDetail" variables = { "focalTweetId": tweet_id, "referrer": "profile", "with_rux_injections": False, # "rankingMode": "Relevance", "includePromotedContent": False, "withCommunity": True, "withQuickPromoteEligibilityTweetFields": False, "withBirdwatchNotes": True, "withVoice": True, } field_toggles = { "withArticleRichContentState": True, "withArticlePlainText": False, "withGrokAnalyze": False, "withDisallowedReplyControls": False, } return self._pagination_tweets( endpoint, variables, ("threaded_conversation_with_injections_v2",), field_toggles=field_toggles) def user_tweets(self, screen_name): endpoint = "/graphql/M3Hpkrb8pjWkEuGdLeXMOA/UserTweets" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withQuickPromoteEligibilityTweetFields": False, "withVoice": True, } field_toggles = { "withArticlePlainText": False, } return self._pagination_tweets( endpoint, variables, field_toggles=field_toggles) def user_tweets_and_replies(self, screen_name): endpoint = "/graphql/pz0IHaV_t7T4HJavqqqcIA/UserTweetsAndReplies" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withCommunity": True, "withVoice": True, } field_toggles = { "withArticlePlainText": False, } return self._pagination_tweets( endpoint, variables, field_toggles=field_toggles) def user_media(self, screen_name): endpoint = "/graphql/8B9DqlaGvYyOvTCzzZWtNA/UserMedia" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withClientEventToken": False, "withBirdwatchNotes": False, "withVoice": True, } field_toggles = { "withArticlePlainText": False, } return self._pagination_tweets( endpoint, variables, field_toggles=field_toggles) def user_likes(self, screen_name): endpoint = "/graphql/uxjTlmrTI61zreSIV1urbw/Likes" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, "withClientEventToken": False, "withBirdwatchNotes": False, "withVoice": True, } field_toggles = { "withArticlePlainText": False, } return self._pagination_tweets( endpoint, variables, field_toggles=field_toggles) def user_bookmarks(self): endpoint = "/graphql/ztCdjqsvvdL0dE8R5ME0hQ/Bookmarks" variables = { "count": 100, "includePromotedContent": False, } return self._pagination_tweets( endpoint, variables, ("bookmark_timeline_v2", "timeline"), False) def list_latest_tweets_timeline(self, list_id): endpoint = "/graphql/LSefrrxhpeX8HITbKfWz9g/ListLatestTweetsTimeline" variables = { "listId": list_id, "count": 100, } return self._pagination_tweets( endpoint, variables, ("list", "tweets_timeline", "timeline")) def search_timeline(self, query, product="Latest"): endpoint = "/graphql/fL2MBiqXPk5pSrOS5ACLdA/SearchTimeline" variables = { "rawQuery": query, "count": 100, "querySource": "typed_query", "product": product, } return self._pagination_tweets( endpoint, variables, ("search_by_raw_query", "search_timeline", "timeline")) def community_tweets_timeline(self, community_id): endpoint = "/graphql/awszcpgwaIeqqNfmzjxUow/CommunityTweetsTimeline" variables = { "communityId": community_id, "count": 100, "displayLocation": "Community", "rankingMode": "Recency", "withCommunity": True, } return self._pagination_tweets( endpoint, variables, ("communityResults", "result", "ranked_community_timeline", "timeline")) def community_media_timeline(self, community_id): endpoint = "/graphql/HfMuDHto2j3NKUeiLjKWHA/CommunityMediaTimeline" variables = { "communityId": community_id, "count": 100, "withCommunity": True, } return self._pagination_tweets( endpoint, variables, ("communityResults", "result", "community_media_timeline", "timeline")) def communities_main_page_timeline(self, screen_name): endpoint = ("/graphql/NbdrKPY_h_nlvZUg7oqH5Q" "/CommunitiesMainPageTimeline") variables = { "count": 100, "withCommunity": True, } return self._pagination_tweets( endpoint, variables, ("viewer", "communities_timeline", "timeline")) def live_event_timeline(self, event_id): endpoint = f"/2/live_event/timeline/{event_id}.json" params = self.params.copy() params["timeline_id"] = "recap" params["urt"] = "true" params["get_annotations"] = "true" return self._pagination_legacy(endpoint, params) def live_event(self, event_id): endpoint = f"/1.1/live_event/1/{event_id}/timeline.json" params = self.params.copy() params["count"] = "0" params["urt"] = "true" return (self._call(endpoint, params) ["twitter_objects"]["live_events"][event_id]) def list_members(self, list_id): endpoint = "/graphql/v97svwb-qcBmzv6QruDuNg/ListMembers" variables = { "listId": list_id, "count": 100, } return self._pagination_users( endpoint, variables, ("list", "members_timeline", "timeline")) def user_followers(self, screen_name): endpoint = "/graphql/jqZ0_HJBA6mnu18iTZYm9w/Followers" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, } return self._pagination_users(endpoint, variables) def user_followers_verified(self, screen_name): endpoint = "/graphql/GHg0X_FjrJoISwwLPWi1LQ/BlueVerifiedFollowers" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, } return self._pagination_users(endpoint, variables) def user_following(self, screen_name): endpoint = "/graphql/4QHbs4wmzgtU91f-t96_Eg/Following" variables = { "userId": self._user_id_by_screen_name(screen_name), "count": 100, "includePromotedContent": False, } return self._pagination_users(endpoint, variables) @memcache(keyarg=1) def user_by_rest_id(self, rest_id): endpoint = "/graphql/5vdJ5sWkbSRDiiNZvwc2Yg/UserByRestId" features = self.features params = { "variables": self._json_dumps({ "userId": rest_id, }), "features": self._json_dumps(features), } return self._call(endpoint, params)["data"]["user"]["result"] @memcache(keyarg=1) def user_by_screen_name(self, screen_name): endpoint = "/graphql/32pL5BWe9WKeSK1MoPvFQQ/UserByScreenName" features = self.features.copy() features["subscriptions_verification_info_" "is_identity_verified_enabled"] = True features["subscriptions_verification_info_" "verified_since_enabled"] = True params = { "variables": self._json_dumps({ "screen_name": screen_name, }), "features": self._json_dumps(features), "fieldToggles": self._json_dumps({ "withAuxiliaryUserLabels": True, }), } return self._call(endpoint, params)["data"]["user"]["result"] def _user_id_by_screen_name(self, screen_name): user = () try: if screen_name.startswith("id:"): user = self.user_by_rest_id(screen_name[3:]) else: user = self.user_by_screen_name(screen_name) self.extractor._assign_user(user) return user["rest_id"] except KeyError: if "unavailable_message" in user: raise exception.NotFoundError( f"{user['unavailable_message'].get('text')} " f"({user.get('reason')})", False) else: raise exception.NotFoundError("user") @cache(maxage=3600) def _guest_token(self): endpoint = "/1.1/guest/activate.json" self.log.info("Requesting guest token") return str(self._call( endpoint, None, "POST", False, "https://api.x.com", )["guest_token"]) def _authenticate_guest(self): guest_token = self._guest_token() if guest_token != self.headers["x-guest-token"]: self.headers["x-guest-token"] = guest_token self.extractor.cookies.set( "gt", guest_token, domain=self.extractor.cookies_domain) @cache(maxage=10800) def _client_transaction(self): self.log.info("Initializing client transaction keys") from .. import transaction_id ct = transaction_id.ClientTransaction() ct.initialize(self.extractor) # update 'x-csrf-token' header (#7467) csrf_token = self.extractor.cookies.get( "ct0", domain=self.extractor.cookies_domain) if csrf_token: self.headers["x-csrf-token"] = csrf_token return ct def _transaction_id(self, url, method="GET"): if self.client_transaction is None: TwitterAPI.client_transaction = self._client_transaction() path = url[url.find("/", 8):] self.headers["x-client-transaction-id"] = \ self.client_transaction.generate_transaction_id(method, path) def _call(self, endpoint, params, method="GET", auth=True, root=None): url = (root or self.root) + endpoint while True: if auth: if self.headers["x-twitter-auth-type"]: self._transaction_id(url, method) else: self._authenticate_guest() response = self.extractor.request( url, method=method, params=params, headers=self.headers, fatal=None) # update 'x-csrf-token' header (#1170) if csrf_token := response.cookies.get("ct0"): self.headers["x-csrf-token"] = csrf_token remaining = int(response.headers.get("x-rate-limit-remaining", 6)) if remaining < 6 and remaining <= random.randrange(1, 6): self._handle_ratelimit(response) continue try: data = response.json() except ValueError: data = {"errors": ({"message": response.text},)} errors = data.get("errors") if not errors: return data retry = False for error in errors: msg = error.get("message") or "Unspecified" self.log.debug("API error: '%s'", msg) if "this account is temporarily locked" in msg: msg = "Account temporarily locked" if self.extractor.config("locked") != "wait": raise exception.AuthorizationError(msg) self.log.warning(msg) self.extractor.input("Press ENTER to retry.") retry = True elif "Could not authenticate you" in msg: if not self.extractor.config("relogin", True): continue username, password = self.extractor._get_auth_info() if not username: continue _login_impl.invalidate(username) self.extractor.cookies_update( _login_impl(self.extractor, username, password)) self.__init__(self.extractor) retry = True elif msg.lower().startswith("timeout"): retry = True if retry: if self.headers["x-twitter-auth-type"]: self.log.debug("Retrying API request") continue else: # fall through to "Login Required" response.status_code = 404 if response.status_code < 400: return data elif response.status_code in (403, 404) and \ not self.headers["x-twitter-auth-type"]: raise exception.AuthorizationError("Login required") elif response.status_code == 429: self._handle_ratelimit(response) continue # error try: errors = ", ".join(e["message"] for e in errors) except Exception: pass raise exception.AbortExtraction( f"{response.status_code} {response.reason} ({errors})") def _pagination_legacy(self, endpoint, params): extr = self.extractor if cursor := extr._init_cursor(): params["cursor"] = cursor original_retweets = (extr.retweets == "original") bottom = ("cursor-bottom-", "sq-cursor-bottom") while True: data = self._call(endpoint, params) instructions = data["timeline"]["instructions"] if not instructions: return extr._update_cursor(None) tweets = data["globalObjects"]["tweets"] users = data["globalObjects"]["users"] tweet_id = cursor = None tweet_ids = [] entries = () # process instructions for instr in instructions: if "addEntries" in instr: entries = instr["addEntries"]["entries"] elif "replaceEntry" in instr: entry = instr["replaceEntry"]["entry"] if entry["entryId"].startswith(bottom): cursor = (entry["content"]["operation"] ["cursor"]["value"]) # collect tweet IDs and cursor value for entry in entries: entry_startswith = entry["entryId"].startswith if entry_startswith(("tweet-", "sq-I-t-")): tweet_ids.append( entry["content"]["item"]["content"]["tweet"]["id"]) elif entry_startswith("homeConversation-"): tweet_ids.extend( entry["content"]["timelineModule"]["metadata"] ["conversationMetadata"]["allTweetIds"][::-1]) elif entry_startswith(bottom): cursor = entry["content"]["operation"]["cursor"] if not cursor.get("stopOnEmptyResponse", True): # keep going even if there are no tweets tweet_id = True cursor = cursor["value"] elif entry_startswith("conversationThread-"): tweet_ids.extend( item["entryId"][6:] for item in entry["content"]["timelineModule"]["items"] if item["entryId"].startswith("tweet-") ) # process tweets for tweet_id in tweet_ids: try: tweet = tweets[tweet_id] except KeyError: self.log.debug("Skipping %s (deleted)", tweet_id) continue if "retweeted_status_id_str" in tweet: retweet = tweets.get(tweet["retweeted_status_id_str"]) if original_retweets: if not retweet: continue retweet["retweeted_status_id_str"] = retweet["id_str"] retweet["_retweet_id_str"] = tweet["id_str"] tweet = retweet elif retweet: tweet["author"] = users[retweet["user_id_str"]] if "extended_entities" in retweet and \ "extended_entities" not in tweet: tweet["extended_entities"] = \ retweet["extended_entities"] tweet["user"] = users[tweet["user_id_str"]] yield tweet if "quoted_status_id_str" in tweet: if quoted := tweets.get(tweet["quoted_status_id_str"]): quoted = quoted.copy() quoted["author"] = users[quoted["user_id_str"]] quoted["quoted_by"] = tweet["user"]["screen_name"] quoted["quoted_by_id_str"] = tweet["id_str"] yield quoted # stop on empty response if not cursor or (not tweets and not tweet_id): return extr._update_cursor(None) params["cursor"] = extr._update_cursor(cursor) def _pagination_tweets(self, endpoint, variables, path=None, stop_tweets=True, features=None, field_toggles=None): extr = self.extractor original_retweets = (extr.retweets == "original") pinned_tweet = extr.pinned params = {"variables": None} if cursor := extr._init_cursor(): variables["cursor"] = cursor if features is None: features = self.features_pagination if features: params["features"] = self._json_dumps(features) if field_toggles: params["fieldToggles"] = self._json_dumps(field_toggles) while True: params["variables"] = self._json_dumps(variables) data = self._call(endpoint, params)["data"] try: if path is None: instructions = (data["user"]["result"]["timeline"] ["timeline"]["instructions"]) else: instructions = data for key in path: instructions = instructions[key] instructions = instructions["instructions"] cursor = None entries = None for instr in instructions: instr_type = instr.get("type") if instr_type == "TimelineAddEntries": if entries: entries.extend(instr["entries"]) else: entries = instr["entries"] elif instr_type == "TimelineAddToModule": entries = instr["moduleItems"] elif instr_type == "TimelinePinEntry": if pinned_tweet: pinned_tweet = instr["entry"] elif instr_type == "TimelineReplaceEntry": entry = instr["entry"] if entry["entryId"].startswith("cursor-bottom-"): cursor = entry["content"]["value"] if entries is None: if not cursor: return extr._update_cursor(None) entries = () except LookupError: extr.log.debug(data) if user := extr._user_obj: user = user["legacy"] if user.get("blocked_by"): if self.headers["x-twitter-auth-type"] and \ extr.config("logout"): extr.cookies_file = None del extr.cookies["auth_token"] self.headers["x-twitter-auth-type"] = None extr.log.info("Retrying API request as guest") continue raise exception.AuthorizationError( f"{user['screen_name']} blocked your account") elif user.get("protected"): raise exception.AuthorizationError( f"{user['screen_name']}'s Tweets are protected") raise exception.AbortExtraction( "Unable to retrieve Tweets from this timeline") tweets = [] tweet = None if pinned_tweet: if isinstance(pinned_tweet, dict): tweets.append(pinned_tweet) elif instructions[-1]["type"] == "TimelinePinEntry": tweets.append(instructions[-1]["entry"]) pinned_tweet = False for entry in entries: esw = entry["entryId"].startswith if esw("tweet-"): tweets.append(entry) elif esw(("profile-grid-", "communities-grid-")): if "content" in entry: tweets.extend(entry["content"]["items"]) else: tweets.append(entry) elif esw(("homeConversation-", "profile-conversation-", "conversationthread-")): tweets.extend(entry["content"]["items"]) elif esw("tombstone-"): item = entry["content"]["itemContent"] item["tweet_results"] = \ {"result": {"tombstone": item["tombstoneInfo"]}} tweets.append(entry) elif esw("cursor-bottom-"): cursor = entry["content"] if "itemContent" in cursor: cursor = cursor["itemContent"] if not cursor.get("stopOnEmptyResponse", True): # keep going even if there are no tweets tweet = True cursor = cursor.get("value") for entry in tweets: try: item = ((entry.get("content") or entry["item"]) ["itemContent"]) if "promotedMetadata" in item and not extr.ads: extr.log.debug( "Skipping %s (ad)", (entry.get("entryId") or "").rpartition("-")[2]) continue tweet = item["tweet_results"]["result"] if "tombstone" in tweet: tweet = self._process_tombstone( entry, tweet["tombstone"]) if not tweet: continue if "tweet" in tweet: tweet = tweet["tweet"] legacy = tweet["legacy"] tweet["sortIndex"] = entry.get("sortIndex") except KeyError: extr.log.debug( "Skipping %s (deleted)", (entry.get("entryId") or "").rpartition("-")[2]) continue if "retweeted_status_result" in legacy: retweet = legacy["retweeted_status_result"]["result"] if "tweet" in retweet: retweet = retweet["tweet"] if original_retweets: try: retweet["legacy"]["retweeted_status_id_str"] = \ retweet["rest_id"] retweet["_retweet_id_str"] = tweet["rest_id"] tweet = retweet except KeyError: continue else: try: legacy["retweeted_status_id_str"] = \ retweet["rest_id"] tweet["author"] = \ retweet["core"]["user_results"]["result"] rtlegacy = retweet["legacy"] if "note_tweet" in retweet: tweet["note_tweet"] = retweet["note_tweet"] if "extended_entities" in rtlegacy and \ "extended_entities" not in legacy: legacy["extended_entities"] = \ rtlegacy["extended_entities"] if "withheld_scope" in rtlegacy and \ "withheld_scope" not in legacy: legacy["withheld_scope"] = \ rtlegacy["withheld_scope"] legacy["full_text"] = rtlegacy["full_text"] except KeyError: pass yield tweet if "quoted_status_result" in tweet: try: quoted = tweet["quoted_status_result"]["result"] quoted["legacy"]["quoted_by"] = ( tweet["core"]["user_results"]["result"] ["legacy"]["screen_name"]) quoted["legacy"]["quoted_by_id_str"] = tweet["rest_id"] quoted["sortIndex"] = entry.get("sortIndex") yield quoted except KeyError: extr.log.debug( "Skipping quote of %s (deleted)", tweet.get("rest_id")) continue if stop_tweets and not tweet: return extr._update_cursor(None) if not cursor or cursor == variables.get("cursor"): return extr._update_cursor(None) variables["cursor"] = extr._update_cursor(cursor) def _pagination_users(self, endpoint, variables, path=None): extr = self.extractor if cursor := extr._init_cursor(): variables["cursor"] = cursor params = { "variables": None, "features" : self._json_dumps(self.features_pagination), } while True: cursor = entry = None params["variables"] = self._json_dumps(variables) data = self._call(endpoint, params)["data"] try: if path is None: instructions = (data["user"]["result"]["timeline"] ["timeline"]["instructions"]) else: for key in path: data = data[key] instructions = data["instructions"] except KeyError: return extr._update_cursor(None) for instr in instructions: if instr["type"] == "TimelineAddEntries": for entry in instr["entries"]: if entry["entryId"].startswith("user-"): try: user = (entry["content"]["itemContent"] ["user_results"]["result"]) except KeyError: pass else: if "rest_id" in user: yield user elif entry["entryId"].startswith("cursor-bottom-"): cursor = entry["content"]["value"] if not cursor or cursor.startswith(("-1|", "0|")) or not entry: return extr._update_cursor(None) variables["cursor"] = extr._update_cursor(cursor) def _handle_ratelimit(self, response): rl = self.extractor.config("ratelimit") if rl == "abort": raise exception.AbortExtraction("Rate limit exceeded") elif rl and isinstance(rl, str) and rl.startswith("wait:"): until = None seconds = text.parse_float(rl.partition(":")[2]) or 60.0 else: until = response.headers.get("x-rate-limit-reset") seconds = None if until else 60.0 self.extractor.wait(until=until, seconds=seconds) def _process_tombstone(self, entry, tombstone): text = (tombstone.get("richText") or tombstone["text"])["text"] tweet_id = entry["entryId"].rpartition("-")[2] if text.startswith("Age-restricted"): if self._nsfw_warning: self._nsfw_warning = False self.log.warning('"%s"', text) self.log.debug("Skipping %s ('%s')", tweet_id, text) @cache(maxage=365*86400, keyarg=1) def _login_impl(extr, username, password): def process(data, params=None): response = extr.request( url, params=params, headers=headers, json=data, method="POST", fatal=None) # update 'x-csrf-token' header (#5945) if csrf_token := response.cookies.get("ct0"): headers["x-csrf-token"] = csrf_token try: data = response.json() except ValueError: data = {"errors": ({"message": "Invalid response"},)} else: if response.status_code < 400: try: return (data["flow_token"], data["subtasks"][0]["subtask_id"]) except LookupError: pass errors = [] for error in data.get("errors") or (): msg = error.get("message") errors.append(f'"{msg}"' if msg else "Unknown error") extr.log.debug(response.text) raise exception.AuthenticationError(", ".join(errors)) cookies = extr.cookies cookies.clear() api = TwitterAPI(extr) api._authenticate_guest() url = "https://api.x.com/1.1/onboarding/task.json" params = {"flow_name": "login"} headers = api.headers extr.log.info("Logging in as %s", username) # init data = { "input_flow_data": { "flow_context": { "debug_overrides": {}, "start_location": {"location": "unknown"}, }, }, "subtask_versions": { "action_list": 2, "alert_dialog": 1, "app_download_cta": 1, "check_logged_in_account": 1, "choice_selection": 3, "contacts_live_sync_permission_prompt": 0, "cta": 7, "email_verification": 2, "end_flow": 1, "enter_date": 1, "enter_email": 2, "enter_password": 5, "enter_phone": 2, "enter_recaptcha": 1, "enter_text": 5, "enter_username": 2, "generic_urt": 3, "in_app_notification": 1, "interest_picker": 3, "js_instrumentation": 1, "menu_dialog": 1, "notifications_permission_prompt": 2, "open_account": 2, "open_home_timeline": 1, "open_link": 1, "phone_verification": 4, "privacy_options": 1, "security_key": 3, "select_avatar": 4, "select_banner": 2, "settings_list": 7, "show_code": 1, "sign_up": 2, "sign_up_review": 4, "tweet_selection_urt": 1, "update_users": 1, "upload_media": 1, "user_recommendations_list": 4, "user_recommendations_urt": 1, "wait_spinner": 3, "web_modal": 1, }, } flow_token, subtask = process(data, params) while not cookies.get("auth_token"): if subtask == "LoginJsInstrumentationSubtask": data = { "js_instrumentation": { "response": "{}", "link": "next_link", }, } elif subtask == "LoginEnterUserIdentifierSSO": data = { "settings_list": { "setting_responses": [ { "key": "user_identifier", "response_data": { "text_data": {"result": username}, }, }, ], "link": "next_link", }, } elif subtask == "LoginEnterPassword": data = { "enter_password": { "password": password, "link": "next_link", }, } elif subtask == "LoginEnterAlternateIdentifierSubtask": alt = extr.config("username-alt") or extr.input( "Alternate Identifier (username, email, phone number): ") data = { "enter_text": { "text": alt, "link": "next_link", }, } elif subtask == "LoginTwoFactorAuthChallenge": data = { "enter_text": { "text": extr.input("2FA Token: "), "link": "next_link", }, } elif subtask == "LoginAcid": data = { "enter_text": { "text": extr.input("Email Verification Code: "), "link": "next_link", }, } elif subtask == "AccountDuplicationCheck": data = { "check_logged_in_account": { "link": "AccountDuplicationCheck_false", }, } elif subtask == "ArkoseLogin": raise exception.AuthenticationError("Login requires CAPTCHA") elif subtask == "DenyLoginSubtask": raise exception.AuthenticationError("Login rejected as suspicious") elif subtask == "LoginSuccessSubtask": raise exception.AuthenticationError( "No 'auth_token' cookie received") else: raise exception.AbortExtraction(f"Unrecognized subtask {subtask}") inputs = {"subtask_id": subtask} inputs.update(data) data = { "flow_token": flow_token, "subtask_inputs": [inputs], } extr.sleep(random.uniform(1.0, 3.0), f"login ({subtask})") flow_token, subtask = process(data) return { cookie.name: cookie.value for cookie in extr.cookies } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/unsplash.py0000644000175000017500000001061015040344700020742 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://unsplash.com/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https?://)?unsplash\.com" class UnsplashExtractor(Extractor): """Base class for unsplash extractors""" category = "unsplash" directory_fmt = ("{category}", "{user[username]}") filename_fmt = "{id}.{extension}" archive_fmt = "{id}" root = "https://unsplash.com" page_start = 1 per_page = 20 def __init__(self, match): Extractor.__init__(self, match) self.item = match[1] def items(self): fmt = self.config("format") or "raw" metadata = self.metadata() for photo in self.photos(): util.delete_items( photo, ("current_user_collections", "related_collections")) url = photo["urls"][fmt] text.nameext_from_url(url, photo) if metadata: photo.update(metadata) photo["extension"] = "jpg" photo["date"] = text.parse_datetime(photo["created_at"]) if "tags" in photo: photo["tags"] = [t["title"] for t in photo["tags"]] yield Message.Directory, photo yield Message.Url, url, photo def metadata(self): return None def skip(self, num): pages = num // self.per_page self.page_start += pages return pages * self.per_page def _pagination(self, url, params, results=False): params["per_page"] = self.per_page params["page"] = self.page_start while True: photos = self.request_json(url, params=params) if results: photos = photos["results"] yield from photos if len(photos) < self.per_page: return params["page"] += 1 class UnsplashImageExtractor(UnsplashExtractor): """Extractor for a single unsplash photo""" subcategory = "image" pattern = BASE_PATTERN + r"/photos/([^/?#]+)" example = "https://unsplash.com/photos/ID" def photos(self): url = f"{self.root}/napi/photos/{self.item}" return (self.request_json(url),) class UnsplashUserExtractor(UnsplashExtractor): """Extractor for all photos of an unsplash user""" subcategory = "user" pattern = BASE_PATTERN + r"/@(\w+)/?$" example = "https://unsplash.com/@USER" def photos(self): url = f"{self.root}/napi/users/{self.item}/photos" params = {"order_by": "latest"} return self._pagination(url, params) class UnsplashFavoriteExtractor(UnsplashExtractor): """Extractor for all likes of an unsplash user""" subcategory = "favorite" pattern = BASE_PATTERN + r"/@(\w+)/likes" example = "https://unsplash.com/@USER/likes" def photos(self): url = f"{self.root}/napi/users/{self.item}/likes" params = {"order_by": "latest"} return self._pagination(url, params) class UnsplashCollectionExtractor(UnsplashExtractor): """Extractor for an unsplash collection""" subcategory = "collection" pattern = BASE_PATTERN + r"/collections/([^/?#]+)(?:/([^/?#]+))?" example = "https://unsplash.com/collections/12345/TITLE" def __init__(self, match): UnsplashExtractor.__init__(self, match) self.title = match[2] or "" def metadata(self): return {"collection_id": self.item, "collection_title": self.title} def photos(self): url = f"{self.root}/napi/collections/{self.item}/photos" params = {"order_by": "latest"} return self._pagination(url, params) class UnsplashSearchExtractor(UnsplashExtractor): """Extractor for unsplash search results""" subcategory = "search" pattern = BASE_PATTERN + r"/s/photos/([^/?#]+)(?:\?([^#]+))?" example = "https://unsplash.com/s/photos/QUERY" def __init__(self, match): UnsplashExtractor.__init__(self, match) self.query = match[2] def photos(self): url = self.root + "/napi/search/photos" params = {"query": text.unquote(self.item.replace('-', ' '))} if self.query: params.update(text.parse_query(self.query)) return self._pagination(url, params, True) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/uploadir.py0000644000175000017500000000357515040344700020740 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://uploadir.com/""" from .common import Extractor, Message from .. import text class UploadirFileExtractor(Extractor): """Extractor for uploadir files""" category = "uploadir" subcategory = "file" root = "https://uploadir.com" filename_fmt = "{filename} ({id}).{extension}" archive_fmt = "{id}" pattern = r"(?:https?://)?uploadir\.com/(?:user/)?u(?:ploads)?/([^/?#]+)" example = "https://uploadir.com/u/ID" def __init__(self, match): Extractor.__init__(self, match) self.file_id = match[1] def items(self): url = f"{self.root}/u/{self.file_id}" response = self.request(url, method="HEAD", allow_redirects=False) if 300 <= response.status_code < 400: url = response.headers["Location"] extr = text.extract_from(self.request(url).text) name = text.unescape(extr("

    ", "

    ").strip()) url = self.root + extr('class="form" action="', '"') token = extr('name="authenticity_token" value="', '"') data = text.nameext_from_url(name, { "_http_method": "POST", "_http_data" : { "authenticity_token": token, "upload_id": self.file_id, }, }) else: hcd = response.headers.get("Content-Disposition") name = (hcd.partition("filename*=UTF-8''")[2] or text.extr(hcd, 'filename="', '"')) data = text.nameext_from_url(name) data["id"] = self.file_id yield Message.Directory, data yield Message.Url, url, data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753381746.0 gallery_dl-1.30.2/gallery_dl/extractor/urlgalleries.py0000644000175000017500000000444715040475562021625 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://urlgalleries.net/""" from .common import GalleryExtractor, Message from .. import text, exception class UrlgalleriesGalleryExtractor(GalleryExtractor): """Base class for Urlgalleries extractors""" category = "urlgalleries" root = "https://urlgalleries.net" request_interval = (0.5, 1.5) pattern = (r"(?:https?://)()(?:(\w+)\.)?urlgalleries\.net" r"/(?:b/([^/?#]+)/)?(?:[\w-]+-)?(\d+)") example = "https://urlgalleries.net/b/BLOG/gallery-12345/TITLE" def items(self): _, blog_alt, blog, self.gallery_id = self.groups if not blog: blog = blog_alt url = f"{self.root}/b/{blog}/porn-gallery-{self.gallery_id}/?a=10000" with self.request(url, allow_redirects=False, fatal=...) as response: if 300 <= response.status_code < 500: if response.headers.get("location", "").endswith( "/not_found_adult.php"): raise exception.NotFoundError("gallery") raise exception.HttpError(None, response) page = response.text imgs = self.images(page) data = self.metadata(page) data["count"] = len(imgs) root = self.root yield Message.Directory, data for data["num"], img in enumerate(imgs, 1): page = self.request(root + img).text url = text.extr(page, "window.location.href = '", "'") yield Message.Queue, url, data def metadata(self, page): extr = text.extract_from(page) return { "gallery_id": self.gallery_id, "_site": extr(' title="', '"'), # site name "blog" : text.unescape(extr(' title="', '"')), "_rprt": extr(' title="', '"'), # report button "title": text.unescape(extr(' title="', '"').strip()), "date" : text.parse_datetime( extr(" images in gallery | ", "<"), "%B %d, %Y"), } def images(self, page): imgs = text.extr(page, 'id="wtf"', "
    ") return list(text.extract_iter(imgs, " href='", "'")) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/urlshortener.py0000644000175000017500000000246515040344700021652 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for general-purpose URL shorteners""" from .common import BaseExtractor, Message from .. import exception class UrlshortenerExtractor(BaseExtractor): """Base class for URL shortener extractors""" basecategory = "urlshortener" BASE_PATTERN = UrlshortenerExtractor.update({ "bitly": { "root": "https://bit.ly", "pattern": r"bit\.ly", }, "tco": { # t.co sends 'http-equiv="refresh"' (200) when using browser UA "headers": {"User-Agent": None}, "root": "https://t.co", "pattern": r"t\.co", }, }) class UrlshortenerLinkExtractor(UrlshortenerExtractor): """Extractor for general-purpose URL shorteners""" subcategory = "link" pattern = BASE_PATTERN + r"(/[^/?#]+)" example = "https://bit.ly/abcde" def items(self): url = self.root + self.groups[-1] location = self.request_location( url, headers=self.config_instance("headers"), notfound="URL") if not location: raise exception.AbortExtraction("Unable to resolve short URL") yield Message.Queue, location, {} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/vanillarock.py0000644000175000017500000000517715040344700021426 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vanilla-rock.com/""" from .common import Extractor, Message from .. import text class VanillarockExtractor(Extractor): """Base class for vanillarock extractors""" category = "vanillarock" root = "https://vanilla-rock.com" def __init__(self, match): Extractor.__init__(self, match) self.path = match[1] class VanillarockPostExtractor(VanillarockExtractor): """Extractor for blogposts on vanilla-rock.com""" subcategory = "post" directory_fmt = ("{category}", "{path}") filename_fmt = "{num:>02}.{extension}" archive_fmt = "{filename}" pattern = (r"(?:https?://)?(?:www\.)?vanilla-rock\.com" r"(/(?!category/|tag/)[^/?#]+)/?$") example = "https://vanilla-rock.com/TITLE" def items(self): extr = text.extract_from(self.request(self.root + self.path).text) name = extr('

    ', "<") imgs = [] while True: img = extr('
    ', '
    ') if not img: break imgs.append(text.extr(img, 'href="', '"')) data = { "count": len(imgs), "title": text.unescape(name), "path" : self.path.strip("/"), "date" : text.parse_datetime(extr( '
    ', '
    '), "%Y-%m-%d %H:%M"), "tags" : text.split_html(extr( '
    ', '
    '))[::2], } yield Message.Directory, data for data["num"], url in enumerate(imgs, 1): yield Message.Url, url, text.nameext_from_url(url, data) class VanillarockTagExtractor(VanillarockExtractor): """Extractor for vanillarock blog posts by tag or category""" subcategory = "tag" pattern = (r"(?:https?://)?(?:www\.)?vanilla-rock\.com" r"(/(?:tag|category)/[^?#]+)") example = "https://vanilla-rock.com/tag/TAG" def items(self): url = self.root + self.path data = {"_extractor": VanillarockPostExtractor} while url: extr = text.extract_from(self.request(url).text) while True: post = extr('

    ', '

    ') if not post: break yield Message.Queue, text.extr(post, 'href="', '"'), data url = text.unescape(extr('class="next page-numbers" href="', '"')) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/vichan.py0000644000175000017500000000633215040344700020363 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for vichan imageboards""" from .common import BaseExtractor, Message from .. import text class VichanExtractor(BaseExtractor): """Base class for vichan extractors""" basecategory = "vichan" BASE_PATTERN = VichanExtractor.update({ "8kun": { "root": "https://8kun.top", "pattern": r"8kun\.top", }, "smugloli": { "root": None, "pattern": r"smuglo(?:\.li|li\.net)", }, }) class VichanThreadExtractor(VichanExtractor): """Extractor for vichan threads""" subcategory = "thread" directory_fmt = ("{category}", "{board}", "{thread} {title}") filename_fmt = "{time}{num:?-//} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = BASE_PATTERN + r"/([^/?#]+)/res/(\d+)" example = "https://8kun.top/a/res/12345.html" def items(self): board = self.groups[-2] thread = self.groups[-1] url = f"{self.root}/{board}/res/{thread}.json" posts = self.request_json(url)["posts"] title = posts[0].get("sub") or text.remove_html(posts[0]["com"]) process = (self._process_8kun if self.category == "8kun" else self._process) data = { "board" : board, "thread": thread, "title" : text.unescape(title)[:50], "num" : 0, } yield Message.Directory, data for post in posts: if "filename" in post: yield process(post, data) if "extra_files" in post: for post["num"], filedata in enumerate( post["extra_files"], 1): yield process(post, filedata) def _process(self, post, data): post.update(data) ext = post["ext"] post["extension"] = ext[1:] post["url"] = url = \ f"{self.root}/{post['board']}/src/{post['tim']}{ext}" return Message.Url, url, post def _process_8kun(self, post, data): post.update(data) ext = post["ext"] tim = post["tim"] if len(tim) > 16: url = f"https://media.128ducks.com/file_store/{tim}{ext}" else: url = f"https://media.128ducks.com/{post['board']}/src/{tim}{ext}" post["url"] = url post["extension"] = ext[1:] return Message.Url, url, post class VichanBoardExtractor(VichanExtractor): """Extractor for vichan boards""" subcategory = "board" pattern = BASE_PATTERN + r"/([^/?#]+)(?:/index|/catalog|/\d+|/?$)" example = "https://8kun.top/a/" def items(self): board = self.groups[-1] url = f"{self.root}/{board}/threads.json" threads = self.request_json(url) for page in threads: for thread in page["threads"]: url = f"{self.root}/{board}/res/{thread['no']}.html" thread["page"] = page["page"] thread["_extractor"] = VichanThreadExtractor yield Message.Queue, url, thread ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/vipergirls.py0000644000175000017500000001130615040344700021276 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2023-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vipergirls.to/""" from .common import Extractor, Message from .. import text, util, exception from ..cache import cache BASE_PATTERN = r"(?:https?://)?(?:www\.)?vipergirls\.to" class VipergirlsExtractor(Extractor): """Base class for vipergirls extractors""" category = "vipergirls" root = "https://vipergirls.to" request_interval = 0.5 request_interval_min = 0.2 cookies_domain = ".vipergirls.to" cookies_names = ("vg_userid", "vg_password") def _init(self): if domain := self.config("domain"): pos = domain.find("://") if pos >= 0: self.root = domain.rstrip("/") self.cookies_domain = "." + domain[pos+1:].strip("/") else: domain = domain.strip("/") self.root = "https://" + domain self.cookies_domain = "." + domain else: self.root = "https://viper.click" self.cookies_domain = ".viper.click" def items(self): self.login() root = self.posts() forum_title = root[1].attrib["title"] thread_title = root[2].attrib["title"] if like := self.config("like"): user_hash = root[0].get("hash") if len(user_hash) < 16: self.log.warning("Login required to like posts") like = False posts = root.iter("post") if self.page: util.advance(posts, (text.parse_int(self.page[5:]) - 1) * 15) for post in posts: images = list(post) data = post.attrib data["forum_title"] = forum_title data["thread_id"] = self.thread_id data["thread_title"] = thread_title data["post_id"] = data.pop("id") data["post_num"] = data.pop("number") data["post_title"] = data.pop("title") data["count"] = len(images) del data["imagecount"] yield Message.Directory, data if images: for data["num"], image in enumerate(images, 1): yield Message.Queue, image.attrib["main_url"], data if like: self.like(post, user_hash) def login(self): if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: self.cookies_update(self._login_impl(username, password)) @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = f"{self.root}/login.php?do=login" data = { "vb_login_username": username, "vb_login_password": password, "do" : "login", "cookieuser" : "1", } response = self.request(url, method="POST", data=data) if not response.cookies.get("vg_password"): raise exception.AuthenticationError() return {cookie.name: cookie.value for cookie in response.cookies} def like(self, post, user_hash): url = self.root + "/post_thanks.php" params = { "do" : "post_thanks_add", "p" : post.get("id"), "securitytoken": user_hash, } with self.request(url, params=params, allow_redirects=False): pass class VipergirlsThreadExtractor(VipergirlsExtractor): """Extractor for vipergirls threads""" subcategory = "thread" pattern = (BASE_PATTERN + r"/threads/(\d+)(?:-[^/?#]+)?(/page\d+)?(?:$|#|\?(?!p=))") example = "https://vipergirls.to/threads/12345-TITLE" def __init__(self, match): VipergirlsExtractor.__init__(self, match) self.thread_id, self.page = match.groups() def posts(self): url = f"{self.root}/vr.php?t={self.thread_id}" return self.request_xml(url) class VipergirlsPostExtractor(VipergirlsExtractor): """Extractor for vipergirls posts""" subcategory = "post" pattern = (BASE_PATTERN + r"/threads/(\d+)(?:-[^/?#]+)?\?p=\d+[^#]*#post(\d+)") example = "https://vipergirls.to/threads/12345-TITLE?p=23456#post23456" def __init__(self, match): VipergirlsExtractor.__init__(self, match) self.thread_id, self.post_id = match.groups() self.page = 0 def posts(self): url = f"{self.root}/vr.php?p={self.post_id}" return self.request_xml(url) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/vk.py0000644000175000017500000001470215040344700017533 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vk.com/""" from .common import Extractor, Message from .. import text, util, exception BASE_PATTERN = r"(?:https://)?(?:www\.|m\.)?vk\.com" class VkExtractor(Extractor): """Base class for vk extractors""" category = "vk" directory_fmt = ("{category}", "{user[name]|user[id]}") filename_fmt = "{id}.{extension}" archive_fmt = "{id}" root = "https://vk.com" request_interval = (0.5, 1.5) def _init(self): self.offset = text.parse_int(self.config("offset")) def finalize(self): if self.offset: self.log.info("Use '-o offset=%s' to continue downloading " "from the current position", self.offset) def skip(self, num): self.offset += num return num def items(self): subn = util.re(r"/imp[fg]/").subn sizes = "wzyxrqpo" data = self.metadata() yield Message.Directory, data for photo in self.photos(): for size in sizes: size += "_" if size in photo: break else: self.log.warning("no photo URL found (%s)", photo.get("id")) continue try: url = photo[size + "src"] except KeyError: self.log.warning("no photo URL found (%s)", photo.get("id")) continue url_sub, count = subn("/", url.partition("?")[0]) if count: photo["_fallback"] = (url,) photo["url"] = url = url_sub else: photo["url"] = url try: _, photo["width"], photo["height"] = photo[size] except ValueError: # photo without width/height entries (#2535) photo["width"] = photo["height"] = 0 photo["id"] = photo["id"].rpartition("_")[2] photo.update(data) text.nameext_from_url(url, photo) yield Message.Url, url, photo def _pagination(self, photos_id): url = self.root + "/al_photos.php" headers = { "X-Requested-With": "XMLHttpRequest", "Origin" : self.root, "Referer" : self.root + "/" + photos_id, } data = { "act" : "show", "al" : "1", "direction": "1", "list" : photos_id, "offset" : self.offset, } while True: response = self.request( url, method="POST", headers=headers, data=data) if response.history and "/challenge.html" in response.url: raise exception.AbortExtraction( f"HTTP redirect to 'challenge' page:\n{response.url}") payload = response.json()["payload"][1] if len(payload) < 4: self.log.debug(payload) raise exception.AuthorizationError( text.unescape(payload[0]) if payload[0] else None) total = payload[1] photos = payload[3] offset_next = self.offset + len(photos) if offset_next >= total: # the last chunk of photos also contains the first few photos # again if 'total' is not a multiple of 10 if extra := total - offset_next: del photos[extra:] yield from photos self.offset = 0 return yield from photos data["offset"] = self.offset = offset_next class VkPhotosExtractor(VkExtractor): """Extractor for photos from a vk user""" subcategory = "photos" pattern = (BASE_PATTERN + r"/(?:" r"(?:albums|photos|id)(-?\d+)" r"|(?!(?:album|tag)-?\d+_?)([^/?#]+))") example = "https://vk.com/id12345" def __init__(self, match): VkExtractor.__init__(self, match) self.user_id, self.user_name = match.groups() def photos(self): return self._pagination("photos" + self.user_id) def metadata(self): if self.user_id: user_id = self.user_id prefix = "public" if user_id[0] == "-" else "id" url = f"{self.root}/{prefix}{user_id.lstrip('-')}" data = self._extract_profile(url) else: url = f"{self.root}/{self.user_name}" data = self._extract_profile(url) self.user_id = data["user"]["id"] return data def _extract_profile(self, url): page = self.request(url).text extr = text.extract_from(page) user = { "id" : extr('property="og:url" content="https://vk.com/id', '"'), "nick": text.unescape(extr( "", " | VK")), "info": text.unescape(extr( ',"activity":"', '","')).replace("\\/", "/"), "name": extr('href="https://m.vk.com/', '"'), } if user["id"]: user["group"] = False else: user["group"] = True user["id"] = extr('data-from-id="', '"') return {"user": user} class VkAlbumExtractor(VkExtractor): """Extractor for a vk album""" subcategory = "album" directory_fmt = ("{category}", "{user[id]}", "{album[id]}") pattern = BASE_PATTERN + r"/album(-?\d+)_(\d+)$" example = "https://vk.com/album12345_00" def __init__(self, match): VkExtractor.__init__(self, match) self.user_id, self.album_id = match.groups() def photos(self): return self._pagination(f"album{self.user_id}_{self.album_id}") def metadata(self): return { "user": {"id": self.user_id}, "album": {"id": self.album_id}, } class VkTaggedExtractor(VkExtractor): """Extractor for a vk tagged photos""" subcategory = "tagged" directory_fmt = ("{category}", "{user[id]}", "tags") pattern = BASE_PATTERN + r"/tag(-?\d+)$" example = "https://vk.com/tag12345" def __init__(self, match): VkExtractor.__init__(self, match) self.user_id = match[1] def photos(self): return self._pagination(f"tag{self.user_id}") def metadata(self): return {"user": {"id": self.user_id}} ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/vsco.py0000644000175000017500000002702715040344700020071 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://vsco.co/""" from .common import Extractor, Message, Dispatch from .. import text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?vsco\.co" USER_PATTERN = BASE_PATTERN + r"/([^/?#]+)" class VscoExtractor(Extractor): """Base class for vsco extractors""" category = "vsco" root = "https://vsco.co" directory_fmt = ("{category}", "{user}") filename_fmt = "{id}.{extension}" archive_fmt = "{id}" def __init__(self, match): Extractor.__init__(self, match) self.user = match[1].lower() def items(self): videos = self.config("videos", True) yield Message.Directory, {"user": self.user} for img in self.images(): if not img: continue elif "playback_url" in img: img = self._transform_video(img) elif "responsive_url" not in img: continue if img["is_video"]: if not videos: continue url = text.ensure_http_scheme(img["video_url"]) else: base = img["responsive_url"].partition("/")[2] cdn, _, path = base.partition("/") if cdn.startswith("aws"): url = f"https://image-{cdn}.vsco.co/{path}" elif cdn.isdecimal(): url = "https://image.vsco.co/" + base elif img["responsive_url"].startswith("http"): url = img["responsive_url"] else: url = "https://" + img["responsive_url"] data = text.nameext_from_url(url, { "id" : img["_id"], "user" : self.user, "grid" : img["grid_name"], "meta" : img.get("image_meta") or {}, "tags" : [tag["text"] for tag in img.get("tags") or ()], "date" : text.parse_timestamp(img["upload_date"] // 1000), "video" : img["is_video"], "width" : img["width"], "height": img["height"], "description": img.get("description") or "", }) if data["extension"] == "m3u8": url = "ytdl:" + url data["_ytdl_manifest"] = "hls" data["extension"] = "mp4" yield Message.Url, url, data def images(self): """Return an iterable with all relevant image objects""" def _extract_preload_state(self, url): page = self.request(url, notfound=self.subcategory).text return util.json_loads(text.extr(page, "__PRELOADED_STATE__ = ", "<") .replace('":undefined', '":null')) def _pagination(self, url, params, token, key, extra=None): headers = { "Referer" : f"{self.root}/{self.user}", "Authorization" : "Bearer " + token, "X-Client-Platform": "web", "X-Client-Build" : "1", } if extra: yield from map(self._transform_media, extra) while True: data = self.request_json(url, params=params, headers=headers) medias = data.get(key) if not medias: return if "cursor" in params: for media in medias: yield media[media["type"]] cursor = data.get("next_cursor") if not cursor: return params["cursor"] = cursor else: yield from medias params["page"] += 1 def _transform_media(self, media): if "responsiveUrl" not in media: return None media["_id"] = media["id"] media["is_video"] = media["isVideo"] media["grid_name"] = media["gridName"] media["upload_date"] = media["uploadDate"] media["responsive_url"] = media["responsiveUrl"] media["video_url"] = media.get("videoUrl") media["image_meta"] = media.get("imageMeta") return media def _transform_video(self, media): media["is_video"] = True media["grid_name"] = "" media["video_url"] = media["playback_url"] media["responsive_url"] = media["poster_url"] media["upload_date"] = media["created_date"] return media class VscoUserExtractor(Dispatch, VscoExtractor): """Extractor for a vsco user profile""" pattern = USER_PATTERN + r"/?$" example = "https://vsco.co/USER" def items(self): base = f"{self.root}/{self.user}/" return self._dispatch_extractors(( (VscoAvatarExtractor , base + "avatar"), (VscoGalleryExtractor , base + "gallery"), (VscoSpacesExtractor , base + "spaces"), (VscoCollectionExtractor, base + "collection"), ), ("gallery",)) class VscoGalleryExtractor(VscoExtractor): """Extractor for a vsco user's gallery""" subcategory = "gallery" pattern = USER_PATTERN + r"/(?:gallery|images)" example = "https://vsco.co/USER/gallery" def images(self): url = f"{self.root}/{self.user}/gallery" data = self._extract_preload_state(url) tkn = data["users"]["currentUser"]["tkn"] sid = str(data["sites"]["siteByUsername"][self.user]["site"]["id"]) url = f"{self.root}/api/3.0/medias/profile" params = { "site_id" : sid, "limit" : "14", "cursor" : None, } return self._pagination(url, params, tkn, "media") class VscoCollectionExtractor(VscoExtractor): """Extractor for images from a collection on vsco.co""" subcategory = "collection" directory_fmt = ("{category}", "{user}", "collection") archive_fmt = "c_{user}_{id}" pattern = USER_PATTERN + r"/collection" example = "https://vsco.co/USER/collection/1" def images(self): url = f"{self.root}/{self.user}/collection/1" data = self._extract_preload_state(url) tkn = data["users"]["currentUser"]["tkn"] cid = (data["sites"]["siteByUsername"][self.user] ["site"]["siteCollectionId"]) url = f"{self.root}/api/2.0/collections/{cid}/medias" params = {"page": 2, "size": "20"} return self._pagination(url, params, tkn, "medias", ( data["medias"]["byId"][mid["id"]]["media"] for mid in data ["collections"]["byId"][cid]["1"]["collection"] )) class VscoSpaceExtractor(VscoExtractor): """Extractor for a vsco.co space""" subcategory = "space" directory_fmt = ("{category}", "space", "{user}") archive_fmt = "s_{user}_{id}" pattern = BASE_PATTERN + r"/spaces/([^/?#]+)" example = "https://vsco.co/spaces/a1b2c3d4e5f" def images(self): url = f"{self.root}/spaces/{self.user}" data = self._extract_preload_state(url) tkn = data["users"]["currentUser"]["tkn"] sid = self.user posts = data["entities"]["posts"] images = data["entities"]["postImages"] for post in posts.values(): post["image"] = images[post["image"]] space = data["spaces"]["byId"][sid] space["postsList"] = [posts[pid] for pid in space["postsList"]] url = f"{self.root}/grpc/spaces/{sid}/posts" params = {} return self._pagination(url, params, tkn, space) def _pagination(self, url, params, token, data): headers = { "Accept" : "application/json", "Referer" : f"{self.root}/spaces/{self.user}", "Content-Type" : "application/json", "Authorization": "Bearer " + token, } while True: for post in data["postsList"]: post = self._transform_media(post["image"]) post["upload_date"] = post["upload_date"]["sec"] * 1000 yield post cursor = data["cursor"] if cursor.get("atEnd"): return params["cursor"] = cursor["postcursorcontext"]["postId"] data = self.request_json(url, params=params, headers=headers) class VscoSpacesExtractor(VscoExtractor): """Extractor for a vsco.co user's spaces""" subcategory = "spaces" pattern = USER_PATTERN + r"/spaces" example = "https://vsco.co/USER/spaces" def items(self): url = f"{self.root}/{self.user}/spaces" data = self._extract_preload_state(url) tkn = data["users"]["currentUser"]["tkn"] uid = data["sites"]["siteByUsername"][self.user]["site"]["userId"] headers = { "Accept" : "application/json", "Referer" : url, "Content-Type" : "application/json", "Authorization": "Bearer " + tkn, } # this would theoretically need to be paginated url = f"{self.root}/grpc/spaces/user/{uid}" data = self.request_json(url, headers=headers) for space in data["spacesWithRoleList"]: space = space["space"] url = f"{self.root}/spaces/{space['id']}" space["_extractor"] = VscoSpaceExtractor yield Message.Queue, url, space class VscoAvatarExtractor(VscoExtractor): """Extractor for vsco.co user avatars""" subcategory = "avatar" pattern = USER_PATTERN + r"/avatar" example = "https://vsco.co/USER/avatar" def images(self): url = f"{self.root}/{self.user}/gallery" page = self.request(url).text piid = text.extr(page, '"profileImageId":"', '"') url = "https://im.vsco.co/" + piid # needs GET request, since HEAD does not redirect to full URL response = self.request(url, allow_redirects=False) return ({ "_id" : piid, "is_video" : False, "grid_name" : "", "upload_date" : 0, "responsive_url": response.headers["Location"], "video_url" : "", "image_meta" : None, "width" : 0, "height" : 0, },) class VscoImageExtractor(VscoExtractor): """Extractor for individual images on vsco.co""" subcategory = "image" pattern = USER_PATTERN + r"/media/([0-9a-fA-F]+)" example = "https://vsco.co/USER/media/0123456789abcdef" def images(self): url = f"{self.root}/{self.user}/media/{self.groups[1]}" data = self._extract_preload_state(url) media = data["medias"]["byId"].popitem()[1]["media"] return (self._transform_media(media),) class VscoVideoExtractor(VscoExtractor): """Extractor for vsco.co videos links""" subcategory = "video" pattern = USER_PATTERN + r"/video/([^/?#]+)" example = "https://vsco.co/USER/video/012345678-9abc-def0" def images(self): url = f"{self.root}/{self.user}/video/{self.groups[1]}" data = self._extract_preload_state(url) media = data["medias"]["byId"].popitem()[1]["media"] return ({ "_id" : media["id"], "is_video" : True, "grid_name" : "", "upload_date" : media["createdDate"], "responsive_url": media["posterUrl"], "video_url" : media.get("playbackUrl"), "image_meta" : None, "width" : media["width"], "height" : media["height"], "description" : media["description"], },) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/wallhaven.py0000644000175000017500000001753515040344700021103 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2018-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://wallhaven.cc/""" from .common import Extractor, Message, Dispatch from .. import text, exception class WallhavenExtractor(Extractor): """Base class for wallhaven extractors""" category = "wallhaven" root = "https://wallhaven.cc" filename_fmt = "{category}_{id}_{resolution}.{extension}" archive_fmt = "{id}" request_interval = 1.4 def _init(self): self.api = WallhavenAPI(self) def items(self): metadata = self.metadata() for wp in self.wallpapers(): self._transform(wp) wp.update(metadata) url = wp["url"] yield Message.Directory, wp yield Message.Url, url, text.nameext_from_url(url, wp) def wallpapers(self): """Return relevant 'wallpaper' objects""" def metadata(self): """Return general metadata""" return () def _transform(self, wp): wp["url"] = wp.pop("path") if "tags" in wp: wp["tags"] = [t["name"] for t in wp["tags"]] wp["date"] = text.parse_datetime( wp.pop("created_at"), "%Y-%m-%d %H:%M:%S") wp["width"] = wp.pop("dimension_x") wp["height"] = wp.pop("dimension_y") wp["wh_category"] = wp["category"] class WallhavenSearchExtractor(WallhavenExtractor): """Extractor for search results on wallhaven.cc""" subcategory = "search" directory_fmt = ("{category}", "{search[tags]}") archive_fmt = "s_{search[q]}_{id}" pattern = r"(?:https?://)?wallhaven\.cc/search(?:/?\?([^#]+))?" example = "https://wallhaven.cc/search?q=QUERY" def __init__(self, match): WallhavenExtractor.__init__(self, match) self.params = text.parse_query(match[1]) def wallpapers(self): return self.api.search(self.params) def metadata(self): return {"search": self.params} class WallhavenCollectionExtractor(WallhavenExtractor): """Extractor for a collection on wallhaven.cc""" subcategory = "collection" directory_fmt = ("{category}", "{username}", "{collection_id}") pattern = r"(?:https?://)?wallhaven\.cc/user/([^/?#]+)/favorites/(\d+)" example = "https://wallhaven.cc/user/USER/favorites/12345" def __init__(self, match): WallhavenExtractor.__init__(self, match) self.username, self.collection_id = match.groups() def wallpapers(self): return self.api.collection(self.username, self.collection_id) def metadata(self): return {"username": self.username, "collection_id": self.collection_id} class WallhavenUserExtractor(Dispatch, WallhavenExtractor): """Extractor for a wallhaven user""" pattern = r"(?:https?://)?wallhaven\.cc/user/([^/?#]+)/?$" example = "https://wallhaven.cc/user/USER" def items(self): base = f"{self.root}/user/{self.groups[0]}/" return self._dispatch_extractors(( (WallhavenUploadsExtractor , base + "uploads"), (WallhavenCollectionsExtractor, base + "favorites"), ), ("uploads",)) class WallhavenCollectionsExtractor(WallhavenExtractor): """Extractor for all collections of a wallhaven user""" subcategory = "collections" pattern = r"(?:https?://)?wallhaven\.cc/user/([^/?#]+)/favorites/?$" example = "https://wallhaven.cc/user/USER/favorites" def __init__(self, match): WallhavenExtractor.__init__(self, match) self.username = match[1] def items(self): base = f"{self.root}/user/{self.username}/favorites/" for collection in self.api.collections(self.username): collection["_extractor"] = WallhavenCollectionExtractor url = f"{base}{collection['id']}" yield Message.Queue, url, collection class WallhavenUploadsExtractor(WallhavenExtractor): """Extractor for all uploads of a wallhaven user""" subcategory = "uploads" directory_fmt = ("{category}", "{username}") archive_fmt = "u_{username}_{id}" pattern = r"(?:https?://)?wallhaven\.cc/user/([^/?#]+)/uploads" example = "https://wallhaven.cc/user/USER/uploads" def __init__(self, match): WallhavenExtractor.__init__(self, match) self.username = match[1] def wallpapers(self): params = {"q": "@" + self.username} return self.api.search(params) def metadata(self): return {"username": self.username} class WallhavenImageExtractor(WallhavenExtractor): """Extractor for individual wallpaper on wallhaven.cc""" subcategory = "image" pattern = (r"(?:https?://)?(?:wallhaven\.cc/w/|whvn\.cc/" r"|w\.wallhaven\.cc/[a-z]+/\w\w/wallhaven-)(\w+)") example = "https://wallhaven.cc/w/ID" def __init__(self, match): WallhavenExtractor.__init__(self, match) self.wallpaper_id = match[1] def wallpapers(self): return (self.api.info(self.wallpaper_id),) class WallhavenAPI(): """Interface for wallhaven's API Ref: https://wallhaven.cc/help/api """ def __init__(self, extractor): self.extractor = extractor key = extractor.config("api-key") if key is None: key = "25HYZenXTICjzBZXzFSg98uJtcQVrDs2" extractor.log.debug("Using default API Key") else: extractor.log.debug("Using custom API Key") self.headers = {"X-API-Key": key} def info(self, wallpaper_id): endpoint = "/v1/w/" + wallpaper_id return self._call(endpoint)["data"] def collection(self, username, collection_id): endpoint = f"/v1/collections/{username}/{collection_id}" return self._pagination(endpoint) def collections(self, username): endpoint = "/v1/collections/" + username return self._pagination(endpoint, metadata=False) def search(self, params): endpoint = "/v1/search" return self._pagination(endpoint, params) def _call(self, endpoint, params=None): url = "https://wallhaven.cc/api" + endpoint while True: response = self.extractor.request( url, params=params, headers=self.headers, fatal=None) if response.status_code < 400: return response.json() if response.status_code == 429: self.extractor.wait(seconds=60) continue self.extractor.log.debug("Server response: %s", response.text) raise exception.AbortExtraction( f"API request failed " f"({response.status_code} {response.reason})") def _pagination(self, endpoint, params=None, metadata=None): if params is None: params_ptr = None params = {} else: params_ptr = params params = params.copy() if metadata is None: metadata = self.extractor.config("metadata") while True: data = self._call(endpoint, params) meta = data.get("meta") if params_ptr is not None: if meta and "query" in meta: query = meta["query"] if isinstance(query, dict): params_ptr["tags"] = query.get("tag") params_ptr["tag_id"] = query.get("id") else: params_ptr["tags"] = query params_ptr["tag_id"] = 0 params_ptr = None if metadata: for wp in data["data"]: yield self.info(str(wp["id"])) else: yield from data["data"] if not meta or meta["current_page"] >= meta["last_page"]: return params["page"] = meta["current_page"] + 1 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/wallpapercave.py0000644000175000017500000000337015040344700021740 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021 David Hoppenbrouwers # Copyright 2023-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://wallpapercave.com/""" from .common import Extractor, Message from .. import text class WallpapercaveImageExtractor(Extractor): """Extractor for images on wallpapercave.com""" category = "wallpapercave" subcategory = "image" root = "https://wallpapercave.com" pattern = r"(?:https?://)?(?:www\.)?wallpapercave\.com/" example = "https://wallpapercave.com/w/wp12345" def items(self): page = self.request(text.ensure_http_scheme(self.url)).text path = None for path in text.extract_iter(page, 'class="download" href="', '"'): image = text.nameext_from_url(path) yield Message.Directory, image yield Message.Url, self.root + path, image if path is None: try: path = text.rextr( page, 'href="', '"', page.index('id="tdownload"'), None) except Exception: pass else: image = text.nameext_from_url(path) yield Message.Directory, image yield Message.Url, self.root + path, image if path is None: for wp in text.extract_iter( page, 'class="wallpaper" id="wp', ''): if path := text.rextr(wp, ' src="', '"'): image = text.nameext_from_url(path) yield Message.Directory, image yield Message.Url, self.root + path, image ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/warosu.py0000644000175000017500000000727015040344700020435 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2023 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://warosu.org/""" from .common import Extractor, Message from .. import text class WarosuThreadExtractor(Extractor): """Extractor for threads on warosu.org""" category = "warosu" subcategory = "thread" root = "https://warosu.org" directory_fmt = ("{category}", "{board}", "{thread} - {title}") filename_fmt = "{tim} {filename}.{extension}" archive_fmt = "{board}_{thread}_{tim}" pattern = r"(?:https?://)?(?:www\.)?warosu\.org/([^/]+)/thread/(\d+)" example = "https://warosu.org/a/thread/12345" def __init__(self, match): Extractor.__init__(self, match) self.board, self.thread = match.groups() def items(self): url = f"{self.root}/{self.board}/thread/{self.thread}" page = self.request(url).text data = self.metadata(page) posts = self.posts(page) if not data["title"]: data["title"] = text.unescape(text.remove_html( posts[0]["com"]))[:50] yield Message.Directory, data for post in posts: if "image" in post: for key in ("w", "h", "no", "time", "tim"): post[key] = text.parse_int(post[key]) dt = text.parse_timestamp(post["time"]) # avoid zero-padding 'day' with %d post["now"] = dt.strftime(f"%a, %b {dt.day}, %Y %H:%M:%S") post.update(data) yield Message.Url, post["image"], post def metadata(self, page): boardname = text.extr(page, "", "") title = text.unescape(text.extr(page, "class=filetitle>", "<")) return { "board" : self.board, "board_name": boardname.split(" - ")[1], "thread" : self.thread, "title" : title, } def posts(self, page): """Build a list of all post objects""" page = text.extr(page, "
    ") needle = "" return [self.parse(post) for post in page.split(needle)] def parse(self, post): """Build post object by extracting data from an HTML post""" data = self._extract_post(post) if '"), "name": extr("class=postername>", "<").strip(), "time": extr("class=posttime title=", "000>"), "com" : text.unescape(text.remove_html(extr( "
    ", "
    ").strip())), } def _extract_image(self, post, data): extr = text.extract_from(post) extr('', "") data["fsize"] = extr("File: ", ", ") data["w"] = extr("", "x") data["h"] = extr("", ", ") data["filename"] = text.unquote(extr( "", "<").rstrip().rpartition(".")[0]) extr("
    ", "") if url := extr(""): if url[0] == "/": data["image"] = self.root + url elif "warosu." not in url: return False else: data["image"] = url return True return False ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/weasyl.py0000644000175000017500000001574415040344700020426 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.weasyl.com/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = r"(?:https://)?(?:www\.)?weasyl.com/" class WeasylExtractor(Extractor): category = "weasyl" directory_fmt = ("{category}", "{owner_login}") filename_fmt = "{submitid} {title}.{extension}" archive_fmt = "{submitid}" root = "https://www.weasyl.com" useragent = util.USERAGENT def populate_submission(self, data): # Some submissions don't have content and can be skipped if "submission" in data["media"]: data["url"] = data["media"]["submission"][0]["url"] data["date"] = text.parse_datetime( data["posted_at"][:19], "%Y-%m-%dT%H:%M:%S") text.nameext_from_url(data["url"], data) return True return False def _init(self): self.session.headers['X-Weasyl-API-Key'] = self.config("api-key") def request_submission(self, submitid): return self.request_json( f"{self.root}/api/submissions/{submitid}/view") def retrieve_journal(self, journalid): data = self.request_json( f"{self.root}/api/journals/{journalid}/view") data["extension"] = "html" data["html"] = "text:" + data["content"] data["date"] = text.parse_datetime(data["posted_at"]) return data def submissions(self, owner_login, folderid=None): metadata = self.config("metadata") url = f"{self.root}/api/users/{owner_login}/gallery" params = { "nextid" : None, "folderid": folderid, } while True: data = self.request_json(url, params=params) for submission in data["submissions"]: if metadata: submission = self.request_submission( submission["submitid"]) if self.populate_submission(submission): submission["folderid"] = folderid # Do any submissions have more than one url? If so # a urllist of the submission array urls would work. yield Message.Url, submission["url"], submission if not data["nextid"]: return params["nextid"] = data["nextid"] class WeasylSubmissionExtractor(WeasylExtractor): subcategory = "submission" pattern = BASE_PATTERN + r"(?:~[\w~-]+/submissions|submission|view)/(\d+)" example = "https://www.weasyl.com/~USER/submissions/12345/TITLE" def __init__(self, match): WeasylExtractor.__init__(self, match) self.submitid = match[1] def items(self): data = self.request_submission(self.submitid) if self.populate_submission(data): yield Message.Directory, data yield Message.Url, data["url"], data class WeasylSubmissionsExtractor(WeasylExtractor): subcategory = "submissions" pattern = BASE_PATTERN + r"(?:~|submissions/)([\w~-]+)/?$" example = "https://www.weasyl.com/submissions/USER" def __init__(self, match): WeasylExtractor.__init__(self, match) self.owner_login = match[1] def items(self): yield Message.Directory, {"owner_login": self.owner_login} yield from self.submissions(self.owner_login) class WeasylFolderExtractor(WeasylExtractor): subcategory = "folder" directory_fmt = ("{category}", "{owner_login}", "{folder_name}") pattern = BASE_PATTERN + r"submissions/([\w~-]+)\?folderid=(\d+)" example = "https://www.weasyl.com/submissions/USER?folderid=12345" def __init__(self, match): WeasylExtractor.__init__(self, match) self.owner_login, self.folderid = match.groups() def items(self): iter = self.submissions(self.owner_login, self.folderid) # Folder names are only on single submission api calls msg, url, data = next(iter) details = self.request_submission(data["submitid"]) yield Message.Directory, details yield msg, url, data yield from iter class WeasylJournalExtractor(WeasylExtractor): subcategory = "journal" filename_fmt = "{journalid} {title}.{extension}" archive_fmt = "{journalid}" pattern = BASE_PATTERN + r"journal/(\d+)" example = "https://www.weasyl.com/journal/12345" def __init__(self, match): WeasylExtractor.__init__(self, match) self.journalid = match[1] def items(self): data = self.retrieve_journal(self.journalid) yield Message.Directory, data yield Message.Url, data["html"], data class WeasylJournalsExtractor(WeasylExtractor): subcategory = "journals" filename_fmt = "{journalid} {title}.{extension}" archive_fmt = "{journalid}" pattern = BASE_PATTERN + r"journals/([\w~-]+)" example = "https://www.weasyl.com/journals/USER" def __init__(self, match): WeasylExtractor.__init__(self, match) self.owner_login = match[1] def items(self): yield Message.Directory, {"owner_login": self.owner_login} url = f"{self.root}/journals/{self.owner_login}" page = self.request(url).text for journalid in text.extract_iter(page, 'href="/journal/', '/'): data = self.retrieve_journal(journalid) yield Message.Url, data["html"], data class WeasylFavoriteExtractor(WeasylExtractor): subcategory = "favorite" directory_fmt = ("{category}", "{user}", "Favorites") pattern = BASE_PATTERN + r"favorites(?:\?userid=(\d+)|/([^/?#]+))" example = "https://www.weasyl.com/favorites?userid=12345" def items(self): userid, username = self.groups owner_login = lastid = None if username: owner_login = username path = "/favorites/" + username else: path = "/favorites" params = { "userid" : userid, "feature": "submit", } while True: page = self.request(self.root + path, params=params).text pos = page.index('id="favorites-content"') if not owner_login: owner_login = text.extr(page, 'Next (', pos) except ValueError: return path = text.unescape(text.rextr(page, 'href="', '"', pos)) params = None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/webmshare.py0000644000175000017500000000363015040344700021066 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://webmshare.com/""" from .common import Extractor, Message from .. import text class WebmshareVideoExtractor(Extractor): """Extractor for webmshare videos""" category = "webmshare" subcategory = "video" root = "https://webmshare.com" filename_fmt = "{id}{title:? //}.{extension}" archive_fmt = "{id}" pattern = (r"(?:https?://)?(?:s\d+\.)?webmshare\.com" r"/(?:play/|download-webm/)?(\w{3,})") example = "https://webmshare.com/_ID_" def __init__(self, match): Extractor.__init__(self, match) self.video_id = match[1] def items(self): url = f"{self.root}/{self.video_id}" extr = text.extract_from(self.request(url).text) data = { "title": text.unescape(extr( 'property="og:title" content="', '"').rpartition(" — ")[0]), "thumb": "https:" + extr('property="og:image" content="', '"'), "url" : "https:" + extr('property="og:video" content="', '"'), "width": text.parse_int(extr( 'property="og:video:width" content="', '"')), "height": text.parse_int(extr( 'property="og:video:height" content="', '"')), "date" : text.parse_datetime(extr( "Added ", "<"), "%B %d, %Y"), "views": text.parse_int(extr('glyphicon-eye-open">
    ', '<')), "id" : self.video_id, "filename" : self.video_id, "extension": "webm", } if data["title"] == "webmshare": data["title"] = "" yield Message.Directory, data yield Message.Url, data["url"], data ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/webtoons.py0000644000175000017500000002013415040344700020747 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2020 Leonardo Taccari # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.webtoons.com/""" from .common import GalleryExtractor, Extractor, Message from .. import exception, text, util BASE_PATTERN = r"(?:https?://)?(?:www\.)?webtoons\.com" LANG_PATTERN = BASE_PATTERN + r"/(([^/?#]+)" class WebtoonsBase(): category = "webtoons" root = "https://www.webtoons.com" directory_fmt = ("{category}", "{comic}") filename_fmt = "{episode_no}-{num:>02}{type:?-//}.{extension}" archive_fmt = "{title_no}_{episode_no}_{num}" cookies_domain = ".webtoons.com" request_interval = (0.5, 1.5) def setup_agegate_cookies(self): self.cookies_update({ "atGDPR" : "AD_CONSENT", "needCCPA" : "false", "needCOPPA" : "false", "needGDPR" : "false", "pagGDPR" : "true", "ageGatePass": "true", }) _init = setup_agegate_cookies def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history and "/ageGate" in response.url: raise exception.AbortExtraction( f"HTTP redirect to age gate check ('{response.url}')") return response class WebtoonsEpisodeExtractor(WebtoonsBase, GalleryExtractor): """Extractor for an episode on webtoons.com""" subcategory = "episode" pattern = (LANG_PATTERN + r"/([^/?#]+)/([^/?#]+)/[^/?#]+)" r"/viewer\?([^#'\"]+)") example = ("https://www.webtoons.com/en/GENRE/TITLE/NAME/viewer" "?title_no=123&episode_no=12345") def _init(self): self.setup_agegate_cookies() base, self.lang, self.genre, self.comic, query = self.groups params = text.parse_query(query) self.title_no = params.get("title_no") self.episode_no = params.get("episode_no") self.page_url = f"{self.root}/{base}/viewer?{query}" def metadata(self, page): extr = text.extract_from(page) title = extr('", "<") episode_name = extr('

    #", "<") else: episode = "" if extr('", "") else: username = author_name = "" return { "genre" : self.genre, "comic" : self.comic, "title_no" : self.title_no, "episode_no" : self.episode_no, "title" : text.unescape(title), "episode" : episode, "comic_name" : text.unescape(comic_name), "episode_name": text.unescape(episode_name), "username" : username, "author_name" : text.unescape(author_name), "description" : text.unescape(descr), "lang" : self.lang, "language" : util.code_to_language(self.lang), } def images(self, page): quality = self.config("quality") if quality is None or quality == "original": quality = {"jpg": False, "jpeg": False, "webp": False} elif not quality: quality = None elif isinstance(quality, str): quality = {"jpg": quality, "jpeg": quality} elif isinstance(quality, int): quality = "q" + str(quality) quality = {"jpg": quality, "jpeg": quality} elif not isinstance(quality, dict): quality = None results = [] for url in text.extract_iter( page, 'class="_images" data-url="', '"'): if quality is not None: path, _, query = url.rpartition("?") type = quality.get(path.rpartition(".")[2].lower()) if type is False: url = path elif type: url = f"{path}?type={type}" results.append((_url(url), None)) return results def assets(self, page): if self.config("thumbnails", False): active = text.extr(page, 'class="on ', '') url = _url(text.extr(active, 'data-url="', '"')) return ({"url": url, "type": "thumbnail"},) class WebtoonsComicExtractor(WebtoonsBase, Extractor): """Extractor for an entire comic on webtoons.com""" subcategory = "comic" categorytransfer = True filename_fmt = "{type}.{extension}" archive_fmt = "{title_no}_{type}" pattern = LANG_PATTERN + r"/([^/?#]+)/([^/?#]+))/list\?([^#]+)" example = "https://www.webtoons.com/en/GENRE/TITLE/list?title_no=123" def items(self): kw = self.kwdict base, kw["lang"], kw["genre"], kw["comic"], query = self.groups params = text.parse_query(query) kw["title_no"] = title_no = text.parse_int(params.get("title_no")) kw["page"] = page_no = text.parse_int(params.get("page"), 1) path = f"/{base}/list?title_no={title_no}&page={page_no}" response = self.request(self.root + path) if response.history: parts = response.url.split("/") base = "/".join(parts[3:-1]) page = response.text if self.config("banners") and (asset := self._asset_banner(page)): yield Message.Directory, asset yield Message.Url, asset["url"], asset data = {"_extractor": WebtoonsEpisodeExtractor} while True: for url in self.get_episode_urls(page): params = text.parse_query(url.rpartition("?")[2]) data["episode_no"] = text.parse_int(params.get("episode_no")) yield Message.Queue, url, data kw["page"] = page_no = page_no + 1 path = f"/{base}/list?title_no={title_no}&page={page_no}" if path not in page: return page = self.request(self.root + path).text def get_episode_urls(self, page): """Extract and return all episode urls in 'page'""" page = text.extr(page, 'id="_listUl"', "") return [ match[0] for match in WebtoonsEpisodeExtractor.pattern.finditer(page) ] def _asset_banner(self, page): try: pos = page.index('", " | Weeb Central")), "author" : text.split_html(extr("Author", ""))[1::2], "tags" : text.split_html(extr("Tag", ""))[1::2], "type" : text.remove_html(extr("Type: ", "")), "status" : text.remove_html(extr("Status: ", "")), "release" : text.remove_html(extr("Released: ", "")), "official": ">Yes" in extr("Official Translatio", ""), "description": text.unescape(text.remove_html(extr( "Description", ""))), } class WeebcentralChapterExtractor(WeebcentralBase, ChapterExtractor): """Extractor for manga chapters from weebcentral.com""" pattern = BASE_PATTERN + r"(/chapters/(\w+))" example = "https://weebcentral.com/chapters/01JHABCDEFGHIJKLMNOPQRSTUV" def metadata(self, page): extr = text.extract_from(page) manga_id = extr("'series_id': '", "'") chapter_type = extr("'chapter_type': '", "'") chapter, sep, minor = extr("'number': '", "'").partition(".") data = { "chapter": text.parse_int(chapter), "chapter_id": self.groups[1], "chapter_type": chapter_type, "chapter_minor": sep + minor, } data.update(self._extract_manga_data(manga_id)) return data def images(self, page): referer = self.page_url url = referer + "/images" params = { "is_prev" : "False", "current_page" : "1", "reading_style": "long_strip", } headers = { "Accept" : "*/*", "Referer" : referer, "HX-Request" : "true", "HX-Current-URL": referer, } page = self.request(url, params=params, headers=headers).text extr = text.extract_from(page) results = [] while True: src = extr('src="', '"') if not src: break results.append((src, { "width" : text.parse_int(extr('width="' , '"')), "height": text.parse_int(extr('height="', '"')), })) return results class WeebcentralMangaExtractor(WeebcentralBase, MangaExtractor): """Extractor for manga from weebcentral.com""" chapterclass = WeebcentralChapterExtractor pattern = BASE_PATTERN + r"/series/(\w+)" example = "https://weebcentral.com/series/01J7ABCDEFGHIJKLMNOPQRSTUV/TITLE" def chapters(self, _): manga_id = self.groups[0] referer = f"{self.root}/series/{manga_id}" url = referer + "/full-chapter-list" headers = { "Accept" : "*/*", "Referer" : referer, "HX-Request" : "true", "HX-Target" : "chapter-list", "HX-Current-URL": referer, } page = self.request(url, headers=headers).text extr = text.extract_from(page) data = self._extract_manga_data(manga_id) base = self.root + "/chapters/" results = [] while True: chapter_id = extr("/chapters/", '"') if not chapter_id: break type, _, chapter = extr('', "<").partition(" ") chapter, sep, minor = chapter.partition(".") chapter = { "chapter_id" : chapter_id, "chapter" : text.parse_int(chapter), "chapter_minor": sep + minor, "chapter_type" : type, "date" : text.parse_datetime( extr(' datetime="', '"')[:-5], "%Y-%m-%dT%H:%M:%S"), } chapter.update(data) results.append((base + chapter_id, chapter)) return results ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/weibo.py0000644000175000017500000003103715040344700020220 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.weibo.com/""" from .common import Extractor, Message, Dispatch from .. import text, util, exception from ..cache import cache import random BASE_PATTERN = r"(?:https?://)?(?:www\.|m\.)?weibo\.c(?:om|n)" USER_PATTERN = BASE_PATTERN + r"/(?:(u|n|p(?:rofile)?)/)?([^/?#]+)(?:/home)?" class WeiboExtractor(Extractor): category = "weibo" directory_fmt = ("{category}", "{user[screen_name]}") filename_fmt = "{status[id]}_{num:>02}.{extension}" archive_fmt = "{status[id]}_{num}" root = "https://weibo.com" request_interval = (1.0, 2.0) def __init__(self, match): Extractor.__init__(self, match) self._prefix, self.user = match.groups() def _init(self): self.livephoto = self.config("livephoto", True) self.retweets = self.config("retweets", False) self.videos = self.config("videos", True) self.movies = self.config("movies", False) self.gifs = self.config("gifs", True) self.gifs_video = (self.gifs == "video") cookies = _cookie_cache() if cookies is not None: self.cookies.update(cookies) def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if response.history: if "login.sina.com" in response.url: raise exception.AbortExtraction( f"HTTP redirect to login page " f"({response.url.partition('?')[0]})") if "passport.weibo.com" in response.url: self._sina_visitor_system(response) response = Extractor.request(self, url, **kwargs) return response def items(self): original_retweets = (self.retweets == "original") for status in self.statuses(): if "ori_mid" in status and not self.retweets: self.log.debug("Skipping %s (快转 retweet)", status["id"]) continue if "retweeted_status" in status: if not self.retweets: self.log.debug("Skipping %s (retweet)", status["id"]) continue # videos of the original post are in status # images of the original post are in status["retweeted_status"] files = [] self._extract_status(status, files) self._extract_status(status["retweeted_status"], files) if original_retweets: status = status["retweeted_status"] else: files = [] self._extract_status(status, files) status["date"] = text.parse_datetime( status["created_at"], "%a %b %d %H:%M:%S %z %Y") status["count"] = len(files) yield Message.Directory, status for num, file in enumerate(files, 1): if file["url"].startswith("http:"): file["url"] = "https:" + file["url"][5:] if "filename" not in file: text.nameext_from_url(file["url"], file) if file["extension"] == "json": file["extension"] = "mp4" file["status"] = status file["num"] = num yield Message.Url, file["url"], file def _extract_status(self, status, files): if "mix_media_info" in status: for item in status["mix_media_info"]["items"]: type = item.get("type") if type == "video": if self.videos: files.append(self._extract_video( item["data"]["media_info"])) elif type == "pic": files.append(item["data"]["largest"].copy()) else: self.log.warning("Unknown media type '%s'", type) return if pic_ids := status.get("pic_ids"): pics = status["pic_infos"] for pic_id in pic_ids: pic = pics[pic_id] pic_type = pic.get("type") if pic_type == "gif" and self.gifs: if self.gifs_video: files.append({"url": pic["video"]}) else: files.append(pic["largest"].copy()) elif pic_type == "livephoto" and self.livephoto: files.append(pic["largest"].copy()) files.append({"url": pic["video"]}) else: files.append(pic["largest"].copy()) if "page_info" in status: info = status["page_info"] if "media_info" in info and self.videos: if info.get("type") != "5" or self.movies: files.append(self._extract_video(info["media_info"])) else: self.log.debug("%s: Ignoring 'movie' video", status["id"]) def _extract_video(self, info): try: media = max(info["playback_list"], key=lambda m: m["meta"]["quality_index"]) except Exception: return {"url": (info.get("stream_url_hd") or info.get("stream_url") or "")} else: return media["play_info"].copy() def _status_by_id(self, status_id): url = f"{self.root}/ajax/statuses/show?id={status_id}" return self.request_json(url) def _user_id(self): if len(self.user) >= 10 and self.user.isdecimal(): return self.user[-10:] else: url = (f"{self.root}/ajax/profile/info?" f"{'screen_name' if self._prefix == 'n' else 'custom'}=" f"{self.user}") return self.request_json(url)["data"]["user"]["idstr"] def _pagination(self, endpoint, params): url = self.root + "/ajax" + endpoint headers = { "X-Requested-With": "XMLHttpRequest", "X-XSRF-TOKEN": None, "Referer": f"{self.root}/u/{params['uid']}", } while True: response = self.request(url, params=params, headers=headers) headers["Accept"] = "application/json, text/plain, */*" headers["X-XSRF-TOKEN"] = response.cookies.get("XSRF-TOKEN") data = response.json() if not data.get("ok"): self.log.debug(response.content) if "since_id" not in params: # first iteration raise exception.AbortExtraction( f'"{data.get("msg") or "unknown error"}"') data = data["data"] statuses = data["list"] yield from statuses # videos, newvideo if cursor := data.get("next_cursor"): if cursor == -1: return params["cursor"] = cursor continue # album if since_id := data.get("since_id"): params["sinceid"] = since_id continue # home, article if "page" in params: if not statuses: return params["page"] += 1 continue # feed, last album page try: params["since_id"] = statuses[-1]["id"] - 1 except LookupError: return def _sina_visitor_system(self, response): self.log.info("Sina Visitor System") passport_url = "https://passport.weibo.com/visitor/genvisitor" headers = {"Referer": response.url} data = { "cb": "gen_callback", "fp": '{"os":"1","browser":"Gecko109,0,0,0","fonts":"undefined",' '"screenInfo":"1920*1080*24","plugins":""}', } page = Extractor.request( self, passport_url, method="POST", headers=headers, data=data).text data = util.json_loads(text.extr(page, "(", ");"))["data"] passport_url = "https://passport.weibo.com/visitor/visitor" params = { "a" : "incarnate", "t" : data["tid"], "w" : "3" if data.get("new_tid") else "2", "c" : f"{data.get('confidence') or 100:>03}", "gc" : "", "cb" : "cross_domain", "from" : "weibo", "_rand": random.random(), } response = Extractor.request(self, passport_url, params=params) _cookie_cache.update("", response.cookies) class WeiboUserExtractor(WeiboExtractor): """Extractor for weibo user profiles""" subcategory = "user" pattern = USER_PATTERN + r"(?:$|#)" example = "https://weibo.com/USER" # do NOT override 'initialize()' # it is needed for 'self._user_id()' # def initialize(self): # pass def items(self): base = f"{self.root}/u/{self._user_id()}?tabtype=" return Dispatch._dispatch_extractors(self, ( (WeiboHomeExtractor , base + "home"), (WeiboFeedExtractor , base + "feed"), (WeiboVideosExtractor , base + "video"), (WeiboNewvideoExtractor, base + "newVideo"), (WeiboAlbumExtractor , base + "album"), ), ("feed",)) class WeiboHomeExtractor(WeiboExtractor): """Extractor for weibo 'home' listings""" subcategory = "home" pattern = USER_PATTERN + r"\?tabtype=home" example = "https://weibo.com/USER?tabtype=home" def statuses(self): endpoint = "/profile/myhot" params = {"uid": self._user_id(), "page": 1, "feature": "2"} return self._pagination(endpoint, params) class WeiboFeedExtractor(WeiboExtractor): """Extractor for weibo user feeds""" subcategory = "feed" pattern = USER_PATTERN + r"\?tabtype=feed" example = "https://weibo.com/USER?tabtype=feed" def statuses(self): endpoint = "/statuses/mymblog" params = {"uid": self._user_id(), "feature": "0"} return self._pagination(endpoint, params) class WeiboVideosExtractor(WeiboExtractor): """Extractor for weibo 'video' listings""" subcategory = "videos" pattern = USER_PATTERN + r"\?tabtype=video" example = "https://weibo.com/USER?tabtype=video" def statuses(self): endpoint = "/profile/getprofilevideolist" params = {"uid": self._user_id()} for status in self._pagination(endpoint, params): yield status["video_detail_vo"] class WeiboNewvideoExtractor(WeiboExtractor): """Extractor for weibo 'newVideo' listings""" subcategory = "newvideo" pattern = USER_PATTERN + r"\?tabtype=newVideo" example = "https://weibo.com/USER?tabtype=newVideo" def statuses(self): endpoint = "/profile/getWaterFallContent" params = {"uid": self._user_id()} return self._pagination(endpoint, params) class WeiboArticleExtractor(WeiboExtractor): """Extractor for weibo 'article' listings""" subcategory = "article" pattern = USER_PATTERN + r"\?tabtype=article" example = "https://weibo.com/USER?tabtype=article" def statuses(self): endpoint = "/statuses/mymblog" params = {"uid": self._user_id(), "page": 1, "feature": "10"} return self._pagination(endpoint, params) class WeiboAlbumExtractor(WeiboExtractor): """Extractor for weibo 'album' listings""" subcategory = "album" pattern = USER_PATTERN + r"\?tabtype=album" example = "https://weibo.com/USER?tabtype=album" def statuses(self): endpoint = "/profile/getImageWall" params = {"uid": self._user_id()} seen = set() for image in self._pagination(endpoint, params): mid = image["mid"] if mid not in seen: seen.add(mid) status = self._status_by_id(mid) if status.get("ok") != 1: self.log.debug("Skipping status %s (%s)", mid, status) else: yield status class WeiboStatusExtractor(WeiboExtractor): """Extractor for images from a status on weibo.cn""" subcategory = "status" pattern = BASE_PATTERN + r"/(detail|status|\d+)/(\w+)" example = "https://weibo.com/detail/12345" def statuses(self): status = self._status_by_id(self.user) if status.get("ok") != 1: self.log.debug(status) raise exception.NotFoundError("status") return (status,) @cache(maxage=365*86400) def _cookie_cache(): return None ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/wikiart.py0000644000175000017500000001131315040344700020560 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.wikiart.org/""" from .common import Extractor, Message from .. import text BASE_PATTERN = r"(?:https?://)?(?:www\.)?wikiart\.org/([a-z]+)" class WikiartExtractor(Extractor): """Base class for wikiart extractors""" category = "wikiart" filename_fmt = "{id}_{title}.{extension}" archive_fmt = "{id}" root = "https://www.wikiart.org" def __init__(self, match): Extractor.__init__(self, match) self.lang = match[1] def items(self): data = self.metadata() yield Message.Directory, data for painting in self.paintings(): url = painting["image"] painting.update(data) yield Message.Url, url, text.nameext_from_url(url, painting) def metadata(self): """Return a dict with general metadata""" def paintings(self): """Return an iterable containing all relevant 'painting' objects""" def _pagination(self, url, extra_params=None, key="Paintings", stop=False): headers = { "X-Requested-With": "XMLHttpRequest", "Referer": url, } params = { "json": "2", "layout": "new", "page": 1, "resultType": "masonry", } if extra_params: params.update(extra_params) while True: data = self.request_json(url, headers=headers, params=params) items = data.get(key) if not items: return yield from items if stop: return params["page"] += 1 class WikiartArtistExtractor(WikiartExtractor): """Extractor for an artist's paintings on wikiart.org""" subcategory = "artist" directory_fmt = ("{category}", "{artist[artistName]}") pattern = BASE_PATTERN + r"/(?!\w+-by-)([\w-]+)/?$" example = "https://www.wikiart.org/en/ARTIST" def __init__(self, match): WikiartExtractor.__init__(self, match) self.artist_name = match[2] self.artist = None def metadata(self): url = f"{self.root}/{self.lang}/{self.artist_name}?json=2" self.artist = self.request_json(url) return {"artist": self.artist} def paintings(self): url = f"{self.root}/{self.lang}/{self.artist_name}/mode/all-paintings" return self._pagination(url) class WikiartImageExtractor(WikiartArtistExtractor): """Extractor for individual paintings on wikiart.org""" subcategory = "image" pattern = BASE_PATTERN + r"/(?!(?:paintings|artists)-by-)([\w-]+)/([\w-]+)" example = "https://www.wikiart.org/en/ARTIST/TITLE" def __init__(self, match): WikiartArtistExtractor.__init__(self, match) self.title = match[3] def paintings(self): title, sep, year = self.title.rpartition("-") if not sep or not year.isdecimal(): title = self.title url = (f"{self.root}/{self.lang}/Search/" f"{self.artist.get('artistName') or self.artist_name} {title}") return self._pagination(url, stop=True) class WikiartArtworksExtractor(WikiartExtractor): """Extractor for artwork collections on wikiart.org""" subcategory = "artworks" directory_fmt = ("{category}", "Artworks by {group!c}", "{type}") pattern = BASE_PATTERN + r"/paintings-by-([\w-]+)/([\w-]+)" example = "https://www.wikiart.org/en/paintings-by-GROUP/TYPE" def __init__(self, match): WikiartExtractor.__init__(self, match) self.group = match[2] self.type = match[3] def metadata(self): return {"group": self.group, "type": self.type} def paintings(self): url = f"{self.root}/{self.lang}/paintings-by-{self.group}/{self.type}" return self._pagination(url) class WikiartArtistsExtractor(WikiartExtractor): """Extractor for artist collections on wikiart.org""" subcategory = "artists" pattern = (BASE_PATTERN + r"/artists-by-([\w-]+)/([\w-]+)") example = "https://www.wikiart.org/en/artists-by-GROUP/TYPE" def __init__(self, match): WikiartExtractor.__init__(self, match) self.group = match[2] self.type = match[3] def items(self): url = f"{self.root}/{self.lang}/App/Search/Artists-by-{self.group}" params = {"json": "3", "searchterm": self.type} for artist in self._pagination(url, params, "Artists"): artist["_extractor"] = WikiartArtistExtractor yield Message.Queue, self.root + artist["artistUrl"], artist ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/wikifeet.py0000644000175000017500000000452215040344700020721 0ustar00mikemike# -*- coding: utf-8 -*- # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.wikifeet.com/""" from .common import GalleryExtractor from .. import text, util class WikifeetGalleryExtractor(GalleryExtractor): """Extractor for image galleries from wikifeet.com""" category = "wikifeet" directory_fmt = ("{category}", "{celebrity}") filename_fmt = "{category}_{celeb}_{pid}.{extension}" archive_fmt = "{type}_{celeb}_{pid}" pattern = (r"(?:https?://)(?:(?:www\.)?wikifeetx?|" r"men\.wikifeet)\.com/([^/?#]+)") example = "https://www.wikifeet.com/CELEB" def __init__(self, match): self.root = text.root_from_url(match[0]) if "wikifeetx.com" in self.root: self.category = "wikifeetx" self.type = "men" if "://men." in self.root else "women" self.celeb = match[1] GalleryExtractor.__init__(self, match, self.root + "/" + self.celeb) def metadata(self, page): extr = text.extract_from(page) return { "celeb" : self.celeb, "type" : self.type, "birthplace": text.unescape(extr('"bplace":"', '"')), "birthday" : text.parse_datetime(text.unescape( extr('"bdate":"', '"'))[:10], "%Y-%m-%d"), "shoesize" : text.unescape(extr('"ssize":', ',')), "rating" : text.parse_float(extr('"score":', ',')), "celebrity" : text.unescape(extr('"cname":"', '"')), } def images(self, page): tagmap = { "C": "Close-up", "T": "Toenails", "N": "Nylons", "A": "Arches", "S": "Soles", "B": "Barefoot", } gallery = text.extr(page, '"gallery":[', '],') base = f"https://pics.wikifeet.com/{self.celeb}-Feet-" return [ (f"{base}{data['pid']}.jpg", { "pid" : data["pid"], "width" : data["pw"], "height": data["ph"], "tags" : [ tagmap[tag] for tag in data["tags"] if tag in tagmap ], }) for data in util.json_loads(f"[{gallery}]") ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/wikimedia.py0000644000175000017500000002031515040344700021053 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022 Ailothaen # Copyright 2024-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for Wikimedia sites""" from .common import BaseExtractor, Message from .. import text, exception from ..cache import cache class WikimediaExtractor(BaseExtractor): """Base class for wikimedia extractors""" basecategory = "wikimedia" filename_fmt = "{filename} ({sha1[:8]}).{extension}" archive_fmt = "{sha1}" request_interval = (1.0, 2.0) def __init__(self, match): BaseExtractor.__init__(self, match) if self.category == "wikimedia": self.category = self.root.split(".")[-2] elif self.category in ("fandom", "wikigg"): self.category = ( f"{self.category}-" f"{self.root.partition('.')[0].rpartition('/')[2]}") self.per_page = self.config("limit", 50) if useragent := self.config_instance("useragent"): self.useragent = useragent def _init(self): if api_path := self.config_instance("api-path"): if api_path[0] == "/": self.api_url = self.root + api_path else: self.api_url = api_path else: self.api_url = None @cache(maxage=36500*86400, keyarg=1) def _search_api_path(self, root): self.log.debug("Probing possible API endpoints") for path in ("/api.php", "/w/api.php", "/wiki/api.php"): url = root + path response = self.request(url, method="HEAD", fatal=None) if response.status_code < 400: return url raise exception.AbortExtraction("Unable to find API endpoint") def prepare(self, image): """Adjust the content of an image object""" image["metadata"] = { m["name"]: m["value"] for m in image["metadata"] or ()} image["commonmetadata"] = { m["name"]: m["value"] for m in image["commonmetadata"] or ()} filename = image["canonicaltitle"] image["filename"], _, image["extension"] = \ filename.partition(":")[2].rpartition(".") image["date"] = text.parse_datetime( image["timestamp"], "%Y-%m-%dT%H:%M:%SZ") def items(self): for info in self._pagination(self.params): try: image = info["imageinfo"][0] except LookupError: self.log.debug("Missing 'imageinfo' for %s", info) continue self.prepare(image) yield Message.Directory, image yield Message.Url, image["url"], image if self.subcategories: base = self.root + "/wiki/" self.params["gcmtype"] = "subcat" for subcat in self._pagination(self.params): url = base + subcat["title"].replace(" ", "_") subcat["_extractor"] = WikimediaArticleExtractor yield Message.Queue, url, subcat def _pagination(self, params): """ https://www.mediawiki.org/wiki/API:Query https://opendata.stackexchange.com/questions/13381 """ url = self.api_url if not url: url = self._search_api_path(self.root) params["action"] = "query" params["format"] = "json" params["prop"] = "imageinfo" params["iiprop"] = ( "timestamp|user|userid|comment|canonicaltitle|url|size|" "sha1|mime|metadata|commonmetadata|extmetadata|bitdepth" ) while True: data = self.request_json(url, params=params) # ref: https://www.mediawiki.org/wiki/API:Errors_and_warnings if error := data.get("error"): self.log.error("%s: %s", error["code"], error["info"]) return # MediaWiki will emit warnings for non-fatal mistakes such as # invalid parameter instead of raising an error if warnings := data.get("warnings"): self.log.debug("MediaWiki returned warnings: %s", warnings) try: pages = data["query"]["pages"] except KeyError: pass else: yield from pages.values() try: continuation = data["continue"] except KeyError: break params.update(continuation) BASE_PATTERN = WikimediaExtractor.update({ "wikimedia": { "root": None, "pattern": r"[a-z]{2,}\." r"wik(?:i(?:pedia|quote|books|source|news|versity|data" r"|voyage)|tionary)" r"\.org", "api-path": "/w/api.php", }, "wikispecies": { "root": "https://species.wikimedia.org", "pattern": r"species\.wikimedia\.org", "api-path": "/w/api.php", }, "wikimediacommons": { "root": "https://commons.wikimedia.org", "pattern": r"commons\.wikimedia\.org", "api-path": "/w/api.php", }, "mediawiki": { "root": "https://www.mediawiki.org", "pattern": r"(?:www\.)?mediawiki\.org", "api-path": "/w/api.php", }, "fandom": { "root": None, "pattern": r"[\w-]+\.fandom\.com", "api-path": "/api.php", }, "wikigg": { "root": None, "pattern": r"\w+\.wiki\.gg", "api-path": "/api.php", }, "mariowiki": { "root": "https://www.mariowiki.com", "pattern": r"(?:www\.)?mariowiki\.com", "api-path": "/api.php", }, "bulbapedia": { "root": "https://bulbapedia.bulbagarden.net", "pattern": r"(?:bulbapedia|archives)\.bulbagarden\.net", "api-path": "/w/api.php", }, "pidgiwiki": { "root": "https://www.pidgi.net", "pattern": r"(?:www\.)?pidgi\.net", "api-path": "/wiki/api.php", }, "azurlanewiki": { "root": "https://azurlane.koumakan.jp", "pattern": r"azurlane\.koumakan\.jp", "api-path": "/w/api.php", "useragent": "Googlebot-Image/1.0", }, }) class WikimediaArticleExtractor(WikimediaExtractor): """Extractor for wikimedia articles""" subcategory = "article" directory_fmt = ("{category}", "{page}") pattern = BASE_PATTERN + r"/(?!static/)([^?#]+)" example = "https://en.wikipedia.org/wiki/TITLE" def __init__(self, match): WikimediaExtractor.__init__(self, match) path = self.groups[-1] if path[2] == "/": self.root = self.root + "/" + path[:2] path = path[3:] if path.startswith("wiki/"): path = path[5:] pre, sep, _ = path.partition(":") prefix = pre.lower() if sep else None self.title = path = text.unquote(path) if prefix: self.subcategory = prefix if prefix == "category": self.subcategories = \ True if self.config("subcategories", True) else False self.params = { "generator": "categorymembers", "gcmtitle" : path, "gcmtype" : "file", "gcmlimit" : self.per_page, } elif prefix == "file": self.subcategories = False self.params = { "titles" : path, } else: self.subcategories = False self.params = { "generator": "images", "gimlimit" : self.per_page, "titles" : path, } def prepare(self, image): WikimediaExtractor.prepare(self, image) image["page"] = self.title class WikimediaWikiExtractor(WikimediaExtractor): """Extractor for all files on a MediaWiki instance""" subcategory = "wiki" pattern = BASE_PATTERN + r"/?$" example = "https://en.wikipedia.org/" def __init__(self, match): WikimediaExtractor.__init__(self, match) # ref: https://www.mediawiki.org/wiki/API:Allpages self.params = { "generator" : "allpages", "gapnamespace": 6, # "File" namespace "gaplimit" : self.per_page, } ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/xfolio.py0000644000175000017500000001155415040344700020415 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://xfolio.jp/""" from .common import Extractor, Message from .. import text, exception BASE_PATTERN = r"(?:https?://)?xfolio\.jp(?:/[^/?#]+)?" class XfolioExtractor(Extractor): """Base class for xfolio extractors""" category = "xfolio" root = "https://xfolio.jp" cookies_domain = ".xfolio.jp" directory_fmt = ("{category}", "{creator_slug}", "{work_id}") filename_fmt = "{work_id}_{image_id}.{extension}" archive_fmt = "{work_id}_{image_id}" request_interval = (0.5, 1.5) def _init(self): XfolioExtractor._init = Extractor._init if not self.cookies_check(("xfolio_session",)): self.log.error("'xfolio_session' cookie required") def items(self): data = {"_extractor": XfolioWorkExtractor} for work in self.works(): yield Message.Queue, work, data def request(self, url, **kwargs): response = Extractor.request(self, url, **kwargs) if "/system/recaptcha" in response.url: raise exception.AbortExtraction("Bot check / CAPTCHA page") return response class XfolioWorkExtractor(XfolioExtractor): subcategory = "work" pattern = BASE_PATTERN + r"/portfolio/([^/?#]+)/works/(\d+)" example = "https://xfolio.jp/portfolio/USER/works/12345" def items(self): creator, work_id = self.groups url = f"{self.root}/portfolio/{creator}/works/{work_id}" html = self.request(url).text work = self._extract_data(html) files = self._extract_files(html, work) work["count"] = len(files) yield Message.Directory, work for work["num"], file in enumerate(files, 1): file.update(work) yield Message.Url, file["url"], file def _extract_data(self, html): creator, work_id = self.groups extr = text.extract_from(html) return { "title" : text.unescape(extr( 'property="og:title" content="', '"').rpartition(" - ")[0]), "description" : text.unescape(extr( 'property="og:description" content="', '"')), "creator_id" : extr(' data-creator-id="', '"'), "creator_userid" : extr(' data-creator-user-id="', '"'), "creator_name" : extr(' data-creator-name="', '"'), "creator_profile": text.unescape(extr( ' data-creator-profile="', '"')), "series_id" : extr("/series/", '"'), "creator_slug" : creator, "work_id" : work_id, } def _extract_files(self, html, work): files = [] work_id = work["work_id"] for img in text.extract_iter( html, 'class="article__wrap_img', ""): image_id = text.extr(img, "/fullscale_image?image_id=", "&") if not image_id: self.log.warning( "%s: 'fullscale_image' not available", work_id) continue files.append({ "image_id" : image_id, "extension": "jpg", "url": (f"{self.root}/user_asset.php?id={image_id}&work_id=" f"{work_id}&work_image_id={image_id}&type=work_image"), "_http_headers": {"Referer": ( f"{self.root}/fullscale_image" f"?image_id={image_id}&work_id={work_id}")}, }) return files class XfolioUserExtractor(XfolioExtractor): subcategory = "user" pattern = BASE_PATTERN + r"/portfolio/([^/?#]+)(?:/works)?/?(?:$|\?|#)" example = "https://xfolio.jp/portfolio/USER" def works(self): url = f"{self.root}/portfolio/{self.groups[0]}/works" while True: html = self.request(url).text for item in text.extract_iter( html, '
    "): yield text.extr(item, ' href="', '"') pager = text.extr(html, ' class="pager__list_next', "") url = text.extr(pager, ' href="', '"') if not url: return url = text.unescape(url) class XfolioSeriesExtractor(XfolioExtractor): subcategory = "series" pattern = BASE_PATTERN + r"/portfolio/([^/?#]+)/series/(\d+)" example = "https://xfolio.jp/portfolio/USER/series/12345" def works(self): creator, series_id = self.groups url = f"{self.root}/portfolio/{creator}/series/{series_id}" html = self.request(url).text return [ text.extr(item, ' href="', '"') for item in text.extract_iter( html, 'class="listWrap--title">', "") ] ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/xhamster.py0000644000175000017500000001026615040344700020747 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2019-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://xhamster.com/""" from .common import Extractor, Message from .. import text, util BASE_PATTERN = (r"(?:https?://)?((?:[\w-]+\.)?xhamster" r"(?:\d?\.(?:com|one|desi)|\.porncache\.net))") class XhamsterExtractor(Extractor): """Base class for xhamster extractors""" category = "xhamster" def __init__(self, match): self.root = "https://" + match[1] Extractor.__init__(self, match) class XhamsterGalleryExtractor(XhamsterExtractor): """Extractor for image galleries on xhamster.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[name]}", "{gallery[id]} {gallery[title]}") filename_fmt = "{num:>03}_{id}.{extension}" archive_fmt = "{id}" pattern = BASE_PATTERN + r"(/photos/gallery/[^/?#]+)" example = "https://xhamster.com/photos/gallery/12345" def items(self): data = self.metadata() yield Message.Directory, data for num, image in enumerate(self.images(), 1): url = image["imageURL"] image.update(data) text.nameext_from_url(url, image) image["num"] = num image["extension"] = "webp" del image["modelName"] yield Message.Url, url, image def metadata(self): data = self.data = self._extract_data(self.root + self.groups[1]) gallery = data["galleryPage"] info = gallery["infoProps"] model = gallery["galleryModel"] author = info["authorInfoProps"] return { "user": { "id" : text.parse_int(model["userId"]), "url" : author["authorLink"], "name" : author["authorName"], "verified" : True if author.get("verified") else False, "subscribers": info["subscribeButtonProps"]["subscribers"], }, "gallery": { "id" : text.parse_int(gallery["id"]), "tags" : [t["label"] for t in info["categoriesTags"]], "date" : text.parse_timestamp(model["created"]), "views" : text.parse_int(model["views"]), "likes" : text.parse_int(model["rating"]["likes"]), "dislikes" : text.parse_int(model["rating"]["dislikes"]), "title" : model["title"], "description": model["description"], "thumbnail" : model["thumbURL"], }, "count": text.parse_int(gallery["photosCount"]), } def images(self): data = self.data self.data = None while True: yield from data["photosGalleryModel"]["photos"] pagination = data["galleryPage"]["paginationProps"] if pagination["currentPageNumber"] >= pagination["lastPageNumber"]: return url = (pagination["pageLinkTemplate"][:-3] + str(pagination["currentPageNumber"] + 1)) data = self._extract_data(url) def _extract_data(self, url): page = self.request(url).text return util.json_loads(text.extr( page, "window.initials=", "").rstrip("\n\r;")) class XhamsterUserExtractor(XhamsterExtractor): """Extractor for all galleries of an xhamster user""" subcategory = "user" pattern = BASE_PATTERN + r"/users/([^/?#]+)(?:/photos)?/?(?:$|[?#])" example = "https://xhamster.com/users/USER/photos" def items(self): url = f"{self.root}/users/{self.groups[1]}/photos" data = {"_extractor": XhamsterGalleryExtractor} while url: extr = text.extract_from(self.request(url).text) while True: url = extr('thumb-image-container role-pop" href="', '"') if not url: break yield Message.Queue, url, data url = extr('data-page="next" href="', '"') ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/xvideos.py0000644000175000017500000000771715040344700020604 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2017-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.xvideos.com/""" from .common import GalleryExtractor, Extractor, Message from .. import text, util BASE_PATTERN = (r"(?:https?://)?(?:www\.)?xvideos\.com" r"/(?:profiles|(?:amateur-|model-)?channels)") class XvideosBase(): """Base class for xvideos extractors""" category = "xvideos" root = "https://www.xvideos.com" class XvideosGalleryExtractor(XvideosBase, GalleryExtractor): """Extractor for user profile galleries on xvideos.com""" subcategory = "gallery" directory_fmt = ("{category}", "{user[name]}", "{gallery[id]} {gallery[title]}") filename_fmt = "{category}_{gallery[id]}_{num:>03}.{extension}" archive_fmt = "{gallery[id]}_{num}" pattern = BASE_PATTERN + r"/([^/?#]+)/photos/(\d+)" example = "https://www.xvideos.com/profiles/USER/photos/12345" def __init__(self, match): self.user, self.gallery_id = match.groups() url = f"{self.root}/profiles/{self.user}/photos/{self.gallery_id}" GalleryExtractor.__init__(self, match, url) def metadata(self, page): extr = text.extract_from(page) user = { "id" : text.parse_int(extr('"id_user":', ',')), "display": extr('"display":"', '"'), "sex" : extr('"sex":"', '"'), "name" : self.user, } title = extr('"title":"', '"') user["description"] = extr( '', '').strip() tags = extr('Tagged:', '<').strip() return { "user": user, "gallery": { "id" : text.parse_int(self.gallery_id), "title": text.unescape(title), "tags" : text.unescape(tags).split(", ") if tags else [], }, } def images(self, page): results = [ (url, None) for url in text.extract_iter( page, 'Next"))["data"] if not isinstance(data["galleries"], dict): return if "0" in data["galleries"]: del data["galleries"]["0"] galleries = [ { "id" : text.parse_int(gid), "title": text.unescape(gdata["title"]), "count": gdata["nb_pics"], "_extractor": XvideosGalleryExtractor, } for gid, gdata in data["galleries"].items() ] galleries.sort(key=lambda x: x["id"]) base = f"{self.root}/profiles/{self.user}/photos/" for gallery in galleries: url = f"{base}{gallery['id']}" yield Message.Queue, url, gallery ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/yiffverse.py0000644000175000017500000001102615040344700021111 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2024 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://yiffverse.com/""" from .booru import BooruExtractor from .. import text import collections BASE_PATTERN = r"(?:https?://)?(?:www\.)?yiffverse\.com" class YiffverseExtractor(BooruExtractor): category = "yiffverse" root = "https://yiffverse.com" root_cdn = "https://furry34com.b-cdn.net" filename_fmt = "{category}_{id}.{extension}" per_page = 30 TAG_TYPES = { None: "general", 1 : "general", 2 : "copyright", 4 : "character", 8 : "artist", } FORMATS = ( ("100", "mov.mp4"), ("101", "mov720.mp4"), ("102", "mov480.mp4"), ("10" , "pic.jpg"), ) def _file_url(self, post): files = post["files"] for fmt, extension in self.FORMATS: if fmt in files: break else: fmt = next(iter(files)) post_id = post["id"] root = self.root_cdn if files[fmt][0] else self.root post["file_url"] = url = \ f"{root}/posts/{post_id // 1000}/{post_id}/{post_id}.{extension}" post["format_id"] = fmt post["format"] = extension.partition(".")[0] return url def _prepare(self, post): post.pop("files", None) post["date"] = text.parse_datetime( post["created"], "%Y-%m-%dT%H:%M:%S.%fZ") post["filename"], _, post["format"] = post["filename"].rpartition(".") if "tags" in post: post["tags"] = [t["value"] for t in post["tags"]] def _tags(self, post, _): if "tags" not in post: post.update(self._fetch_post(post["id"])) tags = collections.defaultdict(list) for tag in post["tags"]: tags[tag["type"]].append(tag["value"]) types = self.TAG_TYPES for type, values in tags.items(): post["tags_" + types[type]] = values def _fetch_post(self, post_id): url = f"{self.root}/api/v2/post/{post_id}" return self.request_json(url) def _pagination(self, endpoint, params=None): url = f"{self.root}/api{endpoint}" if params is None: params = {} params["sortOrder"] = 1 params["status"] = 2 params["take"] = self.per_page threshold = self.per_page while True: data = self.request_json(url, method="POST", json=params) yield from data["items"] if len(data["items"]) < threshold: return params["cursor"] = data.get("cursor") class YiffversePostExtractor(YiffverseExtractor): subcategory = "post" archive_fmt = "{id}" pattern = BASE_PATTERN + r"/post/(\d+)" example = "https://yiffverse.com/post/12345" def posts(self): return (self._fetch_post(self.groups[0]),) class YiffversePlaylistExtractor(YiffverseExtractor): subcategory = "playlist" directory_fmt = ("{category}", "{playlist_id}") archive_fmt = "p_{playlist_id}_{id}" pattern = BASE_PATTERN + r"/playlist/(\d+)" example = "https://yiffverse.com/playlist/12345" def metadata(self): return {"playlist_id": self.groups[0]} def posts(self): endpoint = "/v2/post/search/playlist/" + self.groups[0] return self._pagination(endpoint) class YiffverseTagExtractor(YiffverseExtractor): subcategory = "tag" directory_fmt = ("{category}", "{search_tags}") archive_fmt = "t_{search_tags}_{id}" pattern = BASE_PATTERN + r"/(?:tag/([^/?#]+))?(?:/?\?([^#]+))?(?:$|#)" example = "https://yiffverse.com/tag/TAG" def _init(self): tag, query = self.groups params = text.parse_query(query) self.tags = tags = [] if tag: tags.append(text.unquote(tag)) if "tags" in params: tags.extend(params["tags"].split("|")) type = params.get("type") if type == "video": self.type = 1 elif type == "image": self.type = 0 else: self.type = None def metadata(self): return {"search_tags": " ".join(self.tags)} def posts(self): endpoint = "/v2/post/search/root" params = {"includeTags": [t.replace("_", " ") for t in self.tags]} if self.type is not None: params["type"] = self.type return self._pagination(endpoint, params) ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/ytdl.py0000644000175000017500000001215515040344700020067 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2021-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for sites supported by youtube-dl""" from .common import Extractor, Message from .. import ytdl, config, exception class YoutubeDLExtractor(Extractor): """Generic extractor for youtube-dl supported URLs""" category = "ytdl" directory_fmt = ("{category}", "{subcategory}") filename_fmt = "{title}-{id}.{extension}" archive_fmt = "{extractor_key} {id}" pattern = r"ytdl:(.*)" example = "ytdl:https://www.youtube.com/watch?v=abcdefghijk" def __init__(self, match): # import main youtube_dl module ytdl_module = ytdl.import_module(config.get( ("extractor", "ytdl"), "module")) self.ytdl_module_name = ytdl_module.__name__ # find suitable youtube_dl extractor self.ytdl_url = url = match[1] generic = config.interpolate(("extractor", "ytdl"), "generic", True) if generic == "force": self.ytdl_ie_key = "Generic" self.force_generic_extractor = True else: for ie in ytdl_module.extractor.gen_extractor_classes(): if ie.suitable(url): self.ytdl_ie_key = ie.ie_key() break if not generic and self.ytdl_ie_key == "Generic": raise exception.NoExtractorError() self.force_generic_extractor = False if self.ytdl_ie_key == "Generic" and config.interpolate( ("extractor", "ytdl"), "generic-category", True): # set subcategory to URL domain self.category = "ytdl-generic" self.subcategory = url[url.rfind("/", None, 8)+1:url.find("/", 8)] else: # set subcategory to youtube_dl extractor's key self.subcategory = self.ytdl_ie_key Extractor.__init__(self, match) def items(self): # import subcategory module ytdl_module = ytdl.import_module( config.get(("extractor", "ytdl", self.subcategory), "module") or self.ytdl_module_name) self.log.debug("Using %s", ytdl_module) # construct YoutubeDL object extr_opts = { "extract_flat" : "in_playlist", "force_generic_extractor": self.force_generic_extractor, } user_opts = { "retries" : self._retries, "socket_timeout" : self._timeout, "nocheckcertificate" : not self._verify, } if self._proxies: user_opts["proxy"] = self._proxies.get("http") username, password = self._get_auth_info() if username: user_opts["username"], user_opts["password"] = username, password del username, password ytdl_instance = ytdl.construct_YoutubeDL( ytdl_module, self, user_opts, extr_opts) # transfer cookies to ytdl if cookies := self.cookies: set_cookie = ytdl_instance.cookiejar.set_cookie for cookie in cookies: set_cookie(cookie) # extract youtube_dl info_dict try: info_dict = ytdl_instance._YoutubeDL__extract_info( self.ytdl_url, ytdl_instance.get_info_extractor(self.ytdl_ie_key), False, {}, True) except ytdl_module.utils.YoutubeDLError: raise exception.AbortExtraction("Failed to extract video data") if not info_dict: return elif "entries" in info_dict: results = self._process_entries( ytdl_module, ytdl_instance, info_dict["entries"]) else: results = (info_dict,) # yield results for info_dict in results: info_dict["extension"] = None info_dict["_ytdl_info_dict"] = info_dict info_dict["_ytdl_instance"] = ytdl_instance url = "ytdl:" + (info_dict.get("url") or info_dict.get("webpage_url") or self.ytdl_url) yield Message.Directory, info_dict yield Message.Url, url, info_dict def _process_entries(self, ytdl_module, ytdl_instance, entries): for entry in entries: if not entry: continue if entry.get("_type") in ("url", "url_transparent"): try: entry = ytdl_instance.extract_info( entry["url"], False, ie_key=entry.get("ie_key")) except ytdl_module.utils.YoutubeDLError: continue if not entry: continue if "entries" in entry: yield from self._process_entries( ytdl_module, ytdl_instance, entry["entries"]) else: yield entry if config.get(("extractor", "ytdl"), "enabled"): # make 'ytdl:' prefix optional YoutubeDLExtractor.pattern = r"(?:ytdl:)?(.*)" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1753336256.0 gallery_dl-1.30.2/gallery_dl/extractor/zerochan.py0000644000175000017500000002204015040344700020716 0ustar00mikemike# -*- coding: utf-8 -*- # Copyright 2022-2025 Mike Fährmann # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License version 2 as # published by the Free Software Foundation. """Extractors for https://www.zerochan.net/""" from .booru import BooruExtractor from ..cache import cache from .. import text, util, exception import collections BASE_PATTERN = r"(?:https?://)?(?:www\.)?zerochan\.net" class ZerochanExtractor(BooruExtractor): """Base class for zerochan extractors""" category = "zerochan" root = "https://www.zerochan.net" filename_fmt = "{id}.{extension}" archive_fmt = "{id}" page_start = 1 per_page = 250 cookies_domain = ".zerochan.net" cookies_names = ("z_id", "z_hash") request_interval = (0.5, 1.5) def login(self): self._logged_in = True if self.cookies_check(self.cookies_names): return username, password = self._get_auth_info() if username: return self.cookies_update(self._login_impl(username, password)) self._logged_in = False @cache(maxage=90*86400, keyarg=1) def _login_impl(self, username, password): self.log.info("Logging in as %s", username) url = self.root + "/login" headers = { "Origin" : self.root, "Referer" : url, } data = { "ref" : "/", "name" : username, "password": password, "login" : "Login", } response = self.request(url, method="POST", headers=headers, data=data) if not response.history: raise exception.AuthenticationError() return response.cookies def _parse_entry_html(self, entry_id): url = f"{self.root}/{entry_id}" page = self.request(url).text try: jsonld = self._extract_jsonld(page) except Exception: return {"id": entry_id} extr = text.extract_from(page) data = { "id" : text.parse_int(entry_id), "file_url": jsonld["contentUrl"], "date" : text.parse_datetime(jsonld["datePublished"]), "width" : text.parse_int(jsonld["width"][:-3]), "height" : text.parse_int(jsonld["height"][:-3]), "size" : text.parse_bytes(jsonld["contentSize"][:-1]), "path" : text.split_html(extr( 'class="breadcrumbs', ''))[2:], "uploader": extr('href="/user/', '"'), "tags" : extr('