```
## `soupsieve.select()`
```py3
def select(select, tag, namespaces=None, limit=0, flags=0, **kwargs):
"""Select the specified tags."""
```
`select` will return all tags under the given tag that match the given CSS selectors provided. You can also limit the
number of tags returned by providing a positive integer via the `limit` parameter (0 means to return all tags).
`select` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary,
a `limit`, and `flags`.
```pycon3
>>> import soupsieve as sv
>>> sv.select('p:is(.a, .b, .c)', soup)
[
Cat
,
Dog
,
Mouse
]
```
## `soupsieve.iselect()`
```py3
def iselect(select, node, namespaces=None, limit=0, flags=0, **kwargs):
"""Select the specified tags."""
```
`iselect` is exactly like `select` except that it returns a generator instead of a list.
## `soupsieve.closest()`
```py3
def closest(select, tag, namespaces=None, flags=0, **kwargs):
"""Match closest ancestor to the provided tag."""
```
`closest` returns the tag closest to the given tag that matches the given selector. The element found must be a direct
ancestor of the tag or the tag itself.
`closest` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces)
dictionary, and `flags`.
## `soupsieve.match()`
```py3
def match(select, tag, namespaces=None, flags=0, **kwargs):
"""Match node."""
```
The `match` function matches a given tag with a given CSS selector.
`match` accepts a CSS selector string, a `Tag`/`BeautifulSoup` object, an optional [namespace](#namespaces) dictionary,
and flags.
```pycon3
>>> nodes = sv.select('p:is(.a, .b, .c)', soup)
>>> sv.match('p:not(.b)', nodes[0])
True
>>> sv.match('p:not(.b)', nodes[1])
False
```
## `soupsieve.filter()`
```py3
def filter(select, nodes, namespaces=None, flags=0, **kwargs):
"""Filter list of nodes."""
```
`filter` takes an iterable containing HTML nodes and will filter them based on the provided CSS selector string. If
given a `Tag`/`BeautifulSoup` object, it will iterate the direct children filtering them.
`filter` accepts a CSS selector string, an iterable containing nodes, an optional [namespace](#namespaces) dictionary,
and flags.
```pycon3
>>> sv.filter('p:not(.b)', soup.div)
[
Cat
,
Mouse
]
```
## `soupsieve.escape()`
```py3
def escape(ident):
"""Escape CSS identifier."""
```
`escape` is used to escape CSS identifiers. It follows the [CSS specification][cssom] and escapes any character that
would normally cause an identifier to be invalid.
```pycon3
>>> sv.escape(".foo#bar")
'\\.foo\\#bar'
>>> sv.escape("()[]{}")
'\\(\\)\\[\\]\\{\\}'
>>> sv.escape('--a')
'--a'
>>> sv.escape('0')
'\\30 '
>>> sv.escape('\0')
'�'
```
/// new | New in 1.9.0
`escape` is a new API function added in 1.9.0.
///
## `soupsieve.compile()`
```py3
def compile(pattern, namespaces=None, flags=0, **kwargs):
"""Compile CSS pattern."""
```
`compile` will pre-compile a CSS selector pattern returning a `SoupSieve` object. The `SoupSieve` object has the same
selector functions available via the module without the need to specify the selector, namespaces, or flags.
```py3
class SoupSieve:
"""Match tags in Beautiful Soup with CSS selectors."""
def match(self, tag):
"""Match."""
def closest(self, tag):
"""Match closest ancestor."""
def filter(self, iterable):
"""Filter."""
def select_one(self, tag):
"""Select a single tag."""
def select(self, tag, limit=0):
"""Select the specified tags."""
def iselect(self, tag, limit=0):
"""Iterate the specified tags."""
```
## `soupsieve.purge()`
Soup Sieve caches compiled patterns for performance. If for whatever reason, you need to purge the cache, simply call
`purge`.
## Custom Selectors
The custom selector feature is loosely inspired by the `css-extensions` [proposal][custom-extensions-1]. In its current
form, Soup Sieve allows assigning a complex selector to a custom pseudo-class name. The pseudo-class name must start
with `:--` to avoid conflicts with any future pseudo-classes.
To create custom selectors, you simply need to pass a dictionary containing the custom pseudo-class names (keys) with
the associated CSS selectors that the pseudo-classes are meant to represent (values). It is important to remember that
pseudo-class names are not case sensitive, so even though a dictionary will allow you to specify multiple keys with the
same name (as long as the character cases are different), Soup Sieve will not and will throw an exception if you attempt
to do so.
In the following example, we will define our own custom selector called `#!css :--header` that will be an alias for
`#!css h1, h2, h3, h4, h5, h6`.
```py3
import soupsieve as sv
import bs4
markup = """
Header 1
Header 2
child
Header 1,
Header 2
]
```
Custom selectors can also be dependent upon other custom selectors. You don't have to worry about the order in the
dictionary as custom selectors will be compiled "just in time" when they are needed. Be careful though, if you create
a circular dependency, you will get a `SelectorSyntaxError`.
Assuming the same markup as in the first example, we will now create a custom selector that should find any element that
has child elements, we will call the selector `:--parent`. Then we will create another selector called
`:--parent-paragraph` that will use the `:--parent` selector to find `#!html
` elements that are also parents:
```py3
custom = {
":--parent": ":has(> *|*)",
":--parent-paragraph": "p:--parent"
}
print(sv.select(':--parent-paragraph', soup, custom=custom))
```
The above code will yield the only paragraph that is a parent:
```
[
child
]
```
## Namespaces
Many of Soup Sieve's selector functions take an optional namespace dictionary. Namespaces, just like CSS, must be
defined for Soup Sieve to evaluate `ns|tag` type selectors. This is analogous to CSS's namespace at-rule:
```css
@namespace url("http://www.w3.org/1999/xhtml");
@namespace svg url("http://www.w3.org/2000/svg");
```
A namespace dictionary should have keys (prefixes) and values (namespaces). An empty key string for a key would denote
the default key. An empty value would essentially represent a null namespace. To represent the above CSS example for
Soup Sieve, we would configure it like so:
```py3
namespace = {
"": "http://www.w3.org/1999/xhtml", # Default namespace is for XHTML
"svg": "http://www.w3.org/2000/svg", # The SVG namespace defined with prefix of "svg"
}
```
Prefixes used in the namespace dictionary do not have to match the prefixes in the document. The provided prefix is
never compared against the prefixes in the document, only the namespaces are compared. The prefixes in the document are
only there for the parser to know which tags get which namespace. And the prefixes in the namespace dictionary are only
defined in order to provide an alias for the namespaces when using the namespace selector syntax: `ns|name`.
Tags do not necessarily have to have a prefix for Soup Sieve to recognize them either. For instance, in HTML5, SVG
*should* automatically get the SVG namespace. Depending how namespaces were defined in the document, tags may inherit
namespaces in some conditions. Namespace assignment is mainly handled by the parser and exposed through the Beautiful
Soup API. Soup Sieve uses the Beautiful Soup API to then compare namespaces for supported documents.
soupsieve-2.7/docs/src/markdown/differences.md0000644000000000000000000001420213615410400016463 0ustar00# Beautiful Soup Differences
Soup Sieve is the official CSS "select" implementation of Beautiful Soup 4.7.0+. While the inclusion of Soup Sieve fixes
many issues and greatly expands CSS support in Beautiful Soup, it does introduce some differences which may surprise
some who've become accustom to the old "select" implementation.
Beautiful Soup's old select method had numerous limitations and quirks that do not align with the actual CSS
specifications. Most are insignificant, but there are a couple differences that people over the years had come to rely
on. Soup Sieve, which aims to follow the CSS specification closely, does not support these differences.
## Attribute Values
Beautiful Soup was very relaxed when it came to attribute values in selectors: `#!css [attribute=value]`. Beautiful
Soup would allow almost anything for a valid unquoted value. Soup Sieve, on the other hand, follows the CSS
specification and requires that a value be a valid identifier, or it must be quoted. If you get an error complaining
about a malformed attribute, you may need to quote the value.
For instance, if you previously used a selector like this:
```py3
soup.select('[attr={}]')
```
You would need to quote the value as `{}` is not a valid CSS identifier, so it must be quoted:
```py3
soup.select('[attr="{}"]')
```
You can also use the [escape](./api.md#soupsieveescape) function to escape dynamic content:
```py3
import soupsieve
soup.select('[attr=%s]' % soupsieve.escape('{}'))
```
## CSS Identifiers
Since Soup Sieve follows the CSS specification, class names, id names, tag names, etc. must be valid identifiers. Since
identifiers, according to the CSS specification, cannot *start* with a number, some users may find that their old class,
id, or tag name selectors that started with numbers will not work. To specify such selectors, you'll have to use CSS
escapes.
So if you used to use:
```py3
soup.select('.2class')
```
You would need to update with:
```py3
soup.select(r'.\32 class')
```
Numbers in the middle or at the end of a class will work as they always did:
```py3
soup.select('.class2')
```
## Relative Selectors
Whether on purpose or on accident, Beautiful Soup used to allow relative selectors:
```py3
soup.select('> div')
```
The above is not a valid CSS selector according the CSS specifications. Relative selector lists have only recently been
added to the CSS specifications, and they are only allowed in a `#!css :has()` pseudo-class:
```css
article:has(> div)
```
But, in the level 4 CSS specifications, the `:scope` pseudo-class has been added which allows for the same feel as using
`#!css > div`. Since Soup Sieve supports the `:scope` pseudo-class, it can be used to produce the same behavior as the
legacy select method.
In CSS, the `:scope` pseudo-class represents the element that the CSS select operation is called on. In supported
browsers, the following JavaScript example would treats `:scope` as the element that `el` references:
```js
el.querySelectorAll(':scope > .class')
```
Just like in the JavaScript example above, Soup Sieve would also treat `:scope` as the element that `el` references:
```py3
el.select(':scope > .class')
```
In the case where the element is the document node, `:scope` would simply represent the root element of the document.
So, if you used to have selectors such as:
```py3
soup.select('> div')
```
You can simply add `:scope`, and it should work the same:
```py3
soup.select(':scope > div')
```
While this will generally give you what is expected for the relative, descendant selectors, this will not work for
sibling selectors, and the reasons why are covered in more details in [Out of Scope Selectors](#out-of-scope-selectors).
## Out of Scope Selectors
In a browser, when requesting a selector via `querySelectorAll`, the element that `querySelectorAll` is called on is
the *scoped* element. So in the following example, `el` is the *scoped* element.
```js
el.querySelectorAll('.class')
```
This same concept applies to Soup Sieve, where the element that `select` or `select_one` is called on is also the
*scoped* element. So in the following example, `el` is also the *scoped* element:
```py3
el.select('.class')
```
In browsers, `querySelectorAll` and `querySelector` only return elements under the *scoped* element. They do not return
the *scoped* element itself, its parents, or its siblings. Only when `querySelectorAll` or `querySelector` is called on
the document node will it return the *scoped* selector, which would be the *root* element, as the query is being called
on the document itself and not the *scoped* element.
Soup Sieve aims to essentially mimic the browser functions such as `querySelector`, `querySelectorAll`, `matches`, etc.
In Soup Sieve `select` and `select_one` are analogous to `querySelectorAll` and `querySelector` respectively. For this
reason, Soup Sieve also only returns elements under the *scoped* element. The idea is to provide a familiar interface
that behaves, as close as possible, to what people familiar with CSS selectors are used to.
So while Soup Sieve will find elements relative to `:scope` with `>` or :
```py3
soup.select(':scope > div')
```
It will not find elements relative to `:scope` with `+` or `~` as siblings to the *scoped* element are not under the
*scoped* element:
```py3
soup.select(':scope + div')
```
This is by design and is in align with the behavior exhibited in all web browsers.
## Selected Element Order
Another quirk of Beautiful Soup's old implementation was that it returned the HTML nodes in the order of how the
selectors were defined. For instance, Beautiful Soup, if given the pattern `#!css article, body` would first return
`#!html ` and then `#!html `.
Soup Sieve does not, and frankly cannot, honor Beautiful Soup's old ordering convention due to the way it is designed.
Soup Sieve returns the nodes in the order they are defined in the document as that is how the elements are searched.
This much more efficient and provides better performance.
So, given the earlier selector pattern of `article, body`, Soup Sieve would return the element `#!html ` and then
`#!html ` as that is how it is ordered in the HTML document.
soupsieve-2.7/docs/src/markdown/faq.md0000644000000000000000000000617313615410400014765 0ustar00# Frequent Asked Questions
## Why do selectors not work the same in Beautiful Soup 4.7+?
Soup Sieve is the official CSS selector library in Beautiful Soup 4.7+, and with this change, Soup Sieve introduces a
number of changes that break some of the expected behaviors that existed in versions prior to 4.7.
In short, Soup Sieve follows the CSS specifications fairly close, and this broke a number of non-standard behaviors.
These non-standard behaviors were not allowed according to the CSS specifications. Soup Sieve has no intentions of
bringing back these behaviors.
For more details on specific changes, and the reasoning why a specific change is considered a good change, or simply a
feature that Soup Sieve cannot/will not support, see [Beautiful Soup Differences](./differences.md).
## How does `iframe` handling work?
In web browsers, CSS selectors do not usually select content inside an `iframe` element if the selector is called on an
element outside of the `iframe`. Each HTML document is usually encapsulated and CSS selector leakage across this
`iframe` boundary is usually prevented.
In it's current iteration, Soup Sieve is not aware of the origin of the documents in the `iframe`, and Soup Sieve will
not prevent selectors from crossing these boundaries. Soup Sieve is not used to style documents, but to scrape
documents. For this reason, it seems to be more helpful to allow selector combinators to cross these boundaries.
Soup Sieve isn't entirely unaware of `iframe` elements though. In Soup Sieve 1.9.1, it was noticed that some
pseudo-classes behaved in unexpected ways without awareness to `iframes`, this was fixed in 1.9.1. Pseudo-classes such
as [`:default`](./selectors/pseudo-classes.md#:default), [`:indeterminate`](./selectors/pseudo-classes.md#:indeterminate),
[`:dir()`](./selectors/pseudo-classes.md#:dir), [`:lang()`](./selectors/pseudo-classes.md#:lang),
[`:root`](./selectors/pseudo-classes.md#:root), and [`:contains()`](./selectors/pseudo-classes.md#:contains) were
given awareness of `iframes` to ensure they behaved properly and returned the expected elements. This doesn't mean that
`select` won't return elements in `iframes`, but it won't allow something like `:default` to select a `button` in an
`iframe` whose parent `form` is outside the `iframe`. Or better put, a default `button` will be evaluated in the context
of the document it is in.
With all of this said, if your selectors have issues with `iframes`, it is most likely because `iframes` are handled
differently by different parsers. `html.parser` will usually parse `iframe` elements as it sees them. `lxml` parser will
often remove `html` and `body` tags of an `iframe` HTML document. `lxml-xml` will simply ignore the content in a XHTML
document. And `html5lib` will HTML escape the content of an `iframe` making traversal impossible.
In short, Soup Sieve will return elements from all documents, even `iframes`. But certain pseudo-classes may take into
consideration the context of the document they are in. But even with all of this, a parser's handling of `iframes` may
make handling its content difficult if it doesn't parse it as HTML elements, or augments its structure.
soupsieve-2.7/docs/src/markdown/index.md0000644000000000000000000001267113615410400015325 0ustar00# Quick Start
## Overview
Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting,
matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1
specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup
version 4.7.0, it now is :confetti_ball:. Soup Sieve can also be imported in order to use its API directly for
more controlled, specialized parsing.
Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a
number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply
do not match anything. Some of the supported selectors are:
- `#!css .classes`
- `#!css #ids`
- `#!css [attributes=value]`
- `#!css parent child`
- `#!css parent > child`
- `#!css sibling ~ sibling`
- `#!css sibling + sibling`
- `#!css :not(element.class, element2.class)`
- `#!css :is(element.class, element2.class)`
- `#!css parent:has(> child)`
- and [many more](./selectors/index.md)
## Installation
You must have Beautiful Soup already installed:
```
pip install beautifulsoup4
```
In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via
some alternative method, and Soup Sieve is not automatically installed, you can install it directly:
```
pip install soupsieve
```
If you want to manually install it from source, first ensure that [`build`][build] is installed:
```
pip install build
```
Then navigate to the root of the project and build the wheel and install (replacing `` with the current version):
```
python -m build -w
pip install dist/soupsive--py3-none-any.whl
```
## Usage
To use Soup Sieve, you must create a `BeautifulSoup` object:
```pycon3
>>> import bs4
>>> text = """
...
...
...
Cat
...
Dog
...
Mouse
...
... """
>>> soup = bs4.BeautifulSoup(text, 'html5lib')
```
For most people, using the Beautiful Soup 4.7.0+ API may be more than sufficient. Beautiful Soup offers two methods that employ
Soup Sieve: `select` and `select_one`. Beautiful Soup's select API is identical to Soup Sieve's, except that you don't
have to hand it the tag object, the calling object passes itself to Soup Sieve:
```pycon3
>>> soup = bs4.BeautifulSoup(text, 'html5lib')
>>> soup.select_one('p:is(.a, .b, .c)')
]
```
You can also use the Soup Sieve API directly to get access to the full range of possibilities that Soup Sieve offers.
You can select a single tag:
```pycon3
>>> import soupsieve as sv
>>> sv.select_one('p:is(.a, .b, .c)', soup)
Cat
```
You can select all tags:
```pycon3
>>> import soupsieve as sv
>>> sv.select('p:is(.a, .b, .c)', soup)
[
Cat
,
Dog
,
Mouse
]
```
You can select the closest ancestor:
```pycon3
>>> import soupsieve as sv
>>> el = sv.select_one('.c', soup)
>>> sv.closest('div', el)
Cat
Dog
Mouse
```
You can filter a tag's Children (or an iterable of tags):
```pycon3
>>> sv.filter('p:not(.b)', soup.div)
[
Cat
,
Mouse
]
```
You can match a single tag:
```pycon3
>>> els = sv.select('p:is(.a, .b, .c)', soup)
>>> sv.match('p:not(.b)', els[0])
True
>>> sv.match('p:not(.b)', els[1])
False
```
Or even just extract comments:
```pycon3
>>> sv.comments(soup)
[' These are animals ']
```
Selectors do not have to be constrained to one line either. You can span selectors over multiple lines just like you
would in a CSS file.
```pycon3
>>> selector = """
... .a,
... .b,
... .c
... """
>>> sv.select(selector, soup)
[
Cat
,
Dog
,
Mouse
]
```
You can even use comments to annotate a particularly complex selector.
```pycon3
>>> selector = """
... /* This isn't complicated, but we're going to annotate it anyways.
... This is the a class */
... .a,
... /* This is the b class */
... .b,
... /* This is the c class */
... .c
... """
>>> sv.select(selector, soup)
[
Cat
,
Dog
,
Mouse
]
```
If you've ever used Python's Re library for regular expressions, you may know that it is often useful to pre-compile a
regular expression pattern, especially if you plan to use it more than once. The same is true for Soup Sieve's
matchers, though is not required. If you have a pattern that you want to use more than once, it may be wise to
pre-compile it early on:
```pycon3
>>> selector = sv.compile('p:is(.a, .b, .c)')
>>> selector.filter(soup.div)
[
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('[href]'))
[Internal link, Example link, Insensitive internal link, Example org link]
```
////
///
/// define
`[attribute=value]`
- Represents elements with an attribute named **attribute** that also has a value of **value**.
//// tab | Syntax
```css
[attr=value]
[attr="value"]
```
////
//// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('a[href!="#internal"]'))
[Example link, Insensitive internal link, Example org link]
```
////
///
/// define
`[attribute operator value i]`:material-flask:{: title="Experimental" data-md-color-primary="purple" .icon}
- Represents elements with an attribute named **attribute** and whose value, when the **operator** is applied, matches
**value** *without* case sensitivity. In general, attribute comparison is insensitive in normal HTML, but not XML.
`i` is most useful in XML documents.
//// tab | Syntax
```css
[attr=value i]
[attr="value" i]
```
////
//// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('[href="#INTERNAL" s]'))
[]
>>> print(soup.select('[href="#internal" s]'))
[Internal link]
```
////
///
## Namespace Selectors
Namespace selectors are used in conjunction with type and universal selectors as well as attribute names in attribute
selectors. They are specified by declaring the namespace and the selector separated with `|`: `namespace|selector`.
`namespace`, in this context, is the prefix defined via the [namespace dictionary](../api.md#namespaces). The prefix
defined for the CSS selector does not need to match the prefix name in the document as it is the namespace associated
with the prefix that is compared, not the prefix itself.
The universal selector (`*`) can be used to represent any namespace just as it can with types.
By default, type selectors without a namespace selector will match any element whose type matches, regardless of
namespace. But if a CSS default namespace is declared (one with an empty key: `{"": "http://www.w3.org/1999/xhtml"}`),
all type selectors will assume the default namespace unless an explicit namespace selector is specified. For example,
if the default name was defined to be `http://www.w3.org/1999/xhtml`, the selector `a` would only match `a` tags that
are within the `http://www.w3.org/1999/xhtml` namespace. The one exception is within pseudo classes (`:not()`, `:has()`,
etc.) as namespaces are not considered within pseudo classes unless one is explicitly specified.
If the namespace is omitted (`|element`), any element without a namespace will be matched. In HTML documents that
support namespaces (XHTML and HTML5), HTML elements are counted as part of the `http://www.w3.org/1999/xhtml` namespace,
but attributes usually do not have a namespace unless one is explicitly defined in the markup.
Namespaces can be used with attribute selectors as well except that when `[|attribute`] is used, it is equivalent to
`[attribute]`.
/// tab | Syntax
```css
ns|element
ns|*
*|*
*|element
|element
[ns|attr]
[*|attr]
[|attr]
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select('svg|a', namespaces={'svg': 'http://www.w3.org/2000/svg'}))
[MDN Web Docs]
>>> print(soup.select('a', namespaces={'svg': 'http://www.w3.org/2000/svg'}))
[Soup Sieve Docs, MDN Web Docs]
>>> print(soup.select('a', namespaces={'': 'http://www.w3.org/1999/xhtml', 'svg': 'http://www.w3.org/2000/svg'}))
[Soup Sieve Docs]
>>> print(soup.select('[xlink|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'}))
[MDN Web Docs]
>>> print(soup.select('[|href]', namespaces={'xlink': 'http://www.w3.org/1999/xlink'}))
[Soup Sieve Docs]
```
///
--8<--
selector_styles.md
--8<--
soupsieve-2.7/docs/src/markdown/selectors/combinators.md0000644000000000000000000000671213615410400020540 0ustar00# Combinators and Selector Lists
CSS employs a number of tokens in order to represent lists or to provide relational context between two selectors.
## Selector Lists
Selector lists use the comma (`,`) to join multiple selectors in a list. When presented with a selector list, any
selector in the list that matches an element will return that element.
/// tab | Syntax
```css
element1, element2
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
## Descendant Combinator
Descendant combinators combine two selectors with whitespace () in order to signify that the second
element is matched if it has an ancestor that matches the first element.
/// tab | Syntax
```css
parent descendant
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator
///
## Child combinator
Child combinators combine two selectors with `>` in order to signify that the second element is matched if it has a
parent that matches the first element.
/// tab | Syntax
```css
parent > child
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/Child_combinator
///
## General sibling combinator
General sibling combinators combine two selectors with `~` in order to signify that the second element is matched if it
has a sibling that precedes it that matches the first element.
/// tab | Syntax
```css
prevsibling ~ sibling
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/General_sibling_combinator
///
## Adjacent sibling combinator
Adjacent sibling combinators combine two selectors with `+` in order to signify that the second element is matched if it
has an adjacent sibling that precedes it that matches the first element.
/// tab | Syntax
```css
prevsibling + nextsibling
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/Adjacent_sibling_combinator
///
--8<--
selector_styles.md
--8<--
soupsieve-2.7/docs/src/markdown/selectors/index.md0000644000000000000000000001404313615410400017323 0ustar00# General Details
## Implementation Specifics
The CSS selectors are based off of the CSS specification and includes not only stable selectors, but may also include
selectors currently under development from the draft specifications. Primarily support has been added for selectors that
were feasible to implement and most likely to get practical use. In addition to the selectors in the specification,
Soup Sieve also supports a couple non-standard selectors.
Soup Sieve aims to allow users to target XML/HTML elements with CSS selectors. It implements many pseudo classes, but it
does not currently implement any pseudo elements and has no plans to do so. Soup Sieve also will not match anything for
pseudo classes that are only relevant in a live, browser environment, but it will gracefully handle them if they've been
implemented; such pseudo classes are non-applicable in the Beautiful Soup environment and are noted in [Non-Applicable
Pseudo Classes](./unsupported.md#non-applicable-pseudo-classes).
When speaking about namespaces, they only apply to XML, XHTML, or when dealing with recognized foreign tags in HTML5.
Currently, Beautiful Soup's `html5lib` parser is the only parser that will return the appropriate namespaces for a HTML5
document. If you are using XHTML, you have to use the Beautiful Soup's `lxml-xml` parser (or `xml` for short) to get the
appropriate namespaces in an XHTML document. In addition to using the correct parser, you must provide a dictionary of
namespaces to Soup Sieve in order to use namespace selectors. See the documentation on
[namespaces](../api.md#namespaces) to learn more.
While an effort is made to mimic CSS selector behavior, there may be some differences or quirks, please report issues if
any are found.
## Selector Context Key
Some selectors are very specific to HTML and either have no meaningful representation in XML, or such functionality has
not been implemented. Selectors that are HTML only will be noted with :material-language-html5:{: data-md-color-primary="orange"},
and will match nothing if used in XML.
Soup Sieve has implemented a couple non-standard selectors. These can contain useful selectors that were rejected
from the official CSS specifications, selectors implemented by other systems such as JQuery, or even selectors
specifically created for Soup Sieve. If a selector is considered non standard, it will be marked with
:material-star:{: title="Custom" data-md-color-primary="green"}.
All selectors that are from the current working draft of CSS4 are considered experimental and are marked with
:material-flask:{: title="Experimental" data-md-color-primary="purple"}. Additionally, if there are other immature selectors, they may be marked as experimental as
well. Experimental may mean we are not entirely sure if our implementation is correct, that things may still be in flux
as they are part of a working draft, or even both.
If at anytime a working draft drops a selector from the current draft, it will most likely also be removed here,
most likely with a deprecation path, except where there may be a conflict that requires a less graceful transition.
One exception is in the rare case that the selector is found to be far too useful despite being rejected. In these
cases, we may adopt them as "custom" selectors.
/// tip | Additional Reading
If usage of a selector is not clear in this documentation, you can find more information by reading these
specification documents:
[CSS Level 3 Specification](https://www.w3.org/TR/selectors-3/)
: Contains the latest official document outlying official behaviors of CSS selectors.
[CSS Level 4 Working Draft](https://www.w3.org/TR/selectors-4/)
: Contains the latest published working draft of the CSS level 4 selectors which outlines the experimental new
selectors and experimental behavioral changes.
[HTML5](https://www.w3.org/TR/html50/)
: The HTML 5.0 specification document. Defines the semantics regarding HTML.
[HTML Living Standard](https://html.spec.whatwg.org/)
: The HTML Living Standard document. Defines semantics regarding HTML.
///
## Selector Terminology
Certain terminology is used throughout this document when describing selectors. In order to fully understand the syntax
a selector may implement, it is important to understand a couple of key terms.
### Selector
Selector is used to describe any selector whether it is a [simple](#simple-selector), [compound](#compound-selector), or
[complex](#complex-selector) selector.
### Simple Selector
A simple selector represents a single condition on an element. It can be a [type selector](#type-selectors),
[universal selector](#universal-selectors), [ID selector](#id-selectors), [class selector](#class-selectors),
[attribute selector](#attribute-selectors), or [pseudo class selector](#pseudo-classes).
### Compound Selector
A [compound](#compound-selector) selector is a sequence of [simple](#simple-selector) selectors. They do not contain any
[combinators](#combinators-and-selector-lists). If a universal or type selector is used, they must come first, and only
one instance of either a universal or type selector can be used, both cannot be used at the same time.
### Complex Selector
A complex selector consists of multiple [simple](#simple-selector) or [compound](#compound-selector) selectors joined
with [combinators](#combinators-and-selector-lists).
### Selector List
A selector list is a list of selectors joined with a comma (`,`). A selector list is used to specify that a match is
valid if any of the selectors in a list matches.
--8<--
selector_styles.md
--8<--
soupsieve-2.7/docs/src/markdown/selectors/pseudo-classes.md0000644000000000000000000014237113615410400021154 0ustar00# Pseudo-Classes
## Overview
These are pseudo classes that are either fully or partially supported. Partial support is usually due to limitations of
not being in a live, browser environment. Pseudo classes that cannot be implemented are found under
[Non-Applicable Pseudo Classes](./unsupported.md/#non-applicable-pseudo-classes). Any selectors that are not found here or under the
non-applicable either are under consideration, have not yet been evaluated, or are too new and viewed as a risk to
implement as they might not stick around.
## `:any-link`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon} {:#:any-link}
Selects every `#!html `, or `#!html ` element that has an `href` attribute, independent of
whether it has been visited.
/// tab | Syntax
```css
:any-link
```
///
/// tab | Usage
```pycon3
>>> from bs4 import BeautifulSoup as bs
>>> html = """
...
...
...
...
...
...
... """
>>> soup = bs(html, 'html5lib')
>>> print(soup.select(':any-link'))
[click]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/:any-link
///
/// new | New in 2.2
The CSS specification recently updated to not include `#!html ` in the definition; therefore, Soup Sieve has
removed it as well.
///
## `:checked`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon} {:#:checked}
Selects any `#!html `, `#!html `, or `#!html
Grapes
]
```
///
/// tip | Additional Reading
https://developer.mozilla.org/en-US/docs/Web/CSS/:checked
///
## `:default`:material-language-html5:{: title="HTML" data-md-color-primary="orange" .icon} {:#:default}
Selects any form element that is the default among a group of related elements, including: `#!html
"""
self.assert_selector(
markup,
":default",
['summer', 'd1', 'd3', 'hamster', 'enable'],
flags=util.HTML
)
def test_iframe(self):
"""Test with `iframe`."""
markup = """
"""
self.assert_selector(
markup,
":default",
['d1', 'd3', 'd4'],
flags=util.PYHTML
)
def test_nested_form(self):
"""
Test nested form.
This is technically invalid use of forms, but browsers will generally evaluate first in the nested forms.
"""
markup = """
"""
self.assert_selector(
markup,
":default",
['d1'],
flags=util.HTML
)
def test_default_cached(self):
"""
Test that we use the cached "default".
For the sake of coverage, we will do this impractical select
to ensure we reuse the cached default.
"""
markup = """
"""
self.assert_selector(
markup,
":default:default",
['d1'],
flags=util.HTML
)
def test_nested_form_fail(self):
"""
Test that the search for elements will bail after the first nested form.
You shouldn't nest forms, but if you do, when a parent form encounters a nested form,
we will bail evaluation like browsers do. We should see button 1 getting found for nested
form, but button 2 will not be found for parent form.
"""
markup = """
"""
self.assert_selector(
markup,
":default",
[],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_defined.py0000644000000000000000000000501313615410400016726 0ustar00"""Test defined selectors."""
from .. import util
class TestDefined(util.TestCase):
"""Test defined selectors."""
def test_defined_html(self):
"""Test defined HTML."""
markup = """
"""
self.assert_selector(
markup,
'body :defined',
['0', '2', '3'],
flags=util.HTML
)
@util.skip_no_lxml
def test_defined_xhtml(self):
"""Test defined XHTML."""
markup = """
"""
from lxml import etree
self.assert_selector(
markup,
'body :defined',
# We should get 3, but for LXML versions less than 4.4.0 we don't for reasons stated above.
['0', '2'] if etree.LXML_VERSION < (4, 4, 0, 0) else ['0', '1', '2'],
flags=util.XHTML
)
def test_defined_xml(self):
"""Test defined HTML."""
markup = """
"""
# Defined is a browser thing.
# XML doesn't care about defined and this will match nothing in XML.
self.assert_selector(
markup,
'body :defined',
[],
flags=util.XML
)
soupsieve-2.7/tests/test_level4/test_dir.py0000644000000000000000000001240313615410400016107 0ustar00"""Test direction selectors."""
from .. import util
import soupsieve as sv
class TestDir(util.TestCase):
"""Test direction selectors."""
MARKUP = """
test1
test2
עִבְרִית()
עִבְרִית
test3
"""
def test_dir_rtl(self):
"""Test general direction right to left."""
self.assert_selector(
self.MARKUP,
"div:dir(rtl)",
["1", "4", "6"],
flags=util.HTML
)
def test_dir_ltr(self):
"""Test general direction left to right."""
self.assert_selector(
self.MARKUP,
"div:dir(ltr)",
["3"],
flags=util.HTML
)
def test_dir_conflict(self):
"""Test conflicting direction."""
self.assert_selector(
self.MARKUP,
"div:dir(ltr):dir(rtl)",
[],
flags=util.HTML
)
def test_dir_xml(self):
"""Test direction with XML (not supported)."""
self.assert_selector(
self.MARKUP,
"div:dir(ltr)",
[],
flags=util.XML
)
def test_dir_bidi_detect(self):
"""Test bidirectional detection."""
self.assert_selector(
self.MARKUP,
"span:dir(rtl)",
['2', '5', '7'],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
"span:dir(ltr)",
['8'],
flags=util.HTML
)
def test_dir_on_input(self):
"""Test input direction rules."""
self.assert_selector(
self.MARKUP,
":is(input, textarea):dir(ltr)",
['9', '10', '11', '12', '13'],
flags=util.HTML5
)
def test_dir_on_root(self):
"""Test that the root is assumed left to right if not explicitly defined."""
self.assert_selector(
self.MARKUP,
"html:dir(ltr)",
['0'],
flags=util.HTML
)
def test_dir_auto_root(self):
"""Test that the root is assumed left to right if auto used."""
markup = """
"""
self.assert_selector(
markup,
"html:dir(ltr)",
['0'],
flags=util.HTML
)
def test_dir_on_input_root(self):
"""Test input direction when input is the root."""
markup = """"""
# Input is root
for parser in util.available_parsers('html.parser', 'lxml', 'html5lib'):
soup = self.soup(markup, parser)
fragment = soup.input.extract()
self.assertTrue(sv.match(":root:dir(ltr)", fragment, flags=sv.DEBUG))
def test_iframe(self):
"""Test direction in `iframe`."""
markup = """
"""
self.assert_selector(
markup,
"div:dir(ltr)",
['1'],
flags=util.PYHTML
)
self.assert_selector(
markup,
"div:dir(rtl)",
['2'],
flags=util.PYHTML
)
def test_xml_in_html(self):
"""Test cases for when we have XML in HTML."""
markup = """
"""
def test_is(self):
"""Test multiple selectors with "is"."""
self.assert_selector(
self.MARKUP,
":is(span, a)",
["1", "2"],
flags=util.HTML
)
def test_is_multi_comma(self):
"""Test multiple selectors but with an empty slot due to multiple commas."""
self.assert_selector(
self.MARKUP,
":is(span, , a)",
["1", "2"],
flags=util.HTML
)
def test_is_leading_comma(self):
"""Test multiple selectors but with an empty slot due to leading commas."""
self.assert_selector(
self.MARKUP,
":is(, span, a)",
["1", "2"],
flags=util.HTML
)
def test_is_trailing_comma(self):
"""Test multiple selectors but with an empty slot due to trailing commas."""
self.assert_selector(
self.MARKUP,
":is(span, a, )",
["1", "2"],
flags=util.HTML
)
def test_is_empty(self):
"""Test empty `:is()` selector list."""
self.assert_selector(
self.MARKUP,
":is()",
[],
flags=util.HTML
)
def test_nested_is(self):
"""Test multiple nested selectors."""
self.assert_selector(
self.MARKUP,
":is(span, a:is(#\\32))",
["1", "2"],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
":is(span, a:is(#\\32))",
["1", "2"],
flags=util.HTML
)
def test_is_with_other_pseudo(self):
"""Test `:is()` behavior when paired with `:not()`."""
# Each pseudo class is evaluated separately
# So this will not match
self.assert_selector(
self.MARKUP,
":is(span):not(span)",
[],
flags=util.HTML
)
def test_multiple_is(self):
"""Test `:is()` behavior when paired with `:not()`."""
# Each pseudo class is evaluated separately
# So this will not match
self.assert_selector(
self.MARKUP,
":is(span):is(div)",
[],
flags=util.HTML
)
# Each pseudo class is evaluated separately
# So this will match
self.assert_selector(
self.MARKUP,
":is(a):is(#\\32)",
['2'],
flags=util.HTML
)
def test_invalid_pseudo_class_start_combinator(self):
"""Test invalid start combinator in pseudo-classes other than `:has()`."""
self.assert_raises(':is(> div)', SelectorSyntaxError)
self.assert_raises(':is(div, > div)', SelectorSyntaxError)
def test_invalid_pseudo_orphan_close(self):
"""Test invalid, orphaned pseudo close."""
self.assert_raises('div)', SelectorSyntaxError)
def test_invalid_pseudo_open(self):
"""Test invalid pseudo close."""
self.assert_raises(':is(div', SelectorSyntaxError)
soupsieve-2.7/tests/test_level4/test_lang.py0000644000000000000000000002361613615410400016262 0ustar00"""Test language selectors."""
from .. import util
class TestLang(util.TestCase):
"""Test language selectors."""
MARKUP = """
"""
def test_lang(self):
"""Test language and that it uses implicit wildcard."""
# Implicit wild
self.assert_selector(
self.MARKUP,
"p:lang(de-DE)",
['1', '2', '3', '4', '5', '6'],
flags=util.HTML
)
def test_lang_missing_range(self):
"""Test language range with a missing range."""
# Implicit wild
self.assert_selector(
self.MARKUP,
"p:lang(de--DE)",
[],
flags=util.HTML
)
def test_explicit_wildcard(self):
"""Test language with explicit wildcard (same as implicit)."""
# Explicit wild
self.assert_selector(
self.MARKUP,
"p:lang(de-\\*-DE)",
['1', '2', '3', '4', '5', '6'],
flags=util.HTML
)
def test_only_wildcard(self):
"""Test language with only a wildcard."""
self.assert_selector(
self.MARKUP,
"p:lang('*')",
['1', '2', '3', '4', '5', '6', '7', '8', '9'],
flags=util.HTML
)
def test_wildcard_start_no_match(self):
"""Test language with a wildcard at start, but it matches nothing."""
self.assert_selector(
self.MARKUP,
"p:lang('*-de-DE')",
[],
flags=util.HTML
)
def test_wildcard_start_collapse(self):
"""Test that language with multiple wildcard patterns at start collapse."""
self.assert_selector(
self.MARKUP,
"p:lang('*-*-*-DE')",
['1', '2', '3', '4', '5', '6', '7'],
flags=util.HTML
)
def test_wildcard_at_start_escaped(self):
"""
Test language with wildcard at start (escaped).
Wildcard in the middle is same as implicit, but at the start, it has specific meaning.
"""
self.assert_selector(
self.MARKUP,
"p:lang(\\*-DE)",
['1', '2', '3', '4', '5', '6', '7'],
flags=util.HTML
)
def test_language_quoted(self):
"""Test language (quoted)."""
# Normal quoted
self.assert_selector(
self.MARKUP,
"p:lang('de-DE')",
['1', '2', '3', '4', '5', '6'],
flags=util.HTML
)
def test_language_quoted_with_escaped_newline(self):
"""Test language (quoted) with escaped new line."""
# Normal quoted
self.assert_selector(
self.MARKUP,
"p:lang('de-\\\nDE')",
['1', '2', '3', '4', '5', '6'],
flags=util.HTML
)
def test_wildcard_at_start_quoted(self):
"""Test language with wildcard at start (quoted)."""
# First wild quoted
self.assert_selector(
self.MARKUP,
"p:lang('*-DE')",
['1', '2', '3', '4', '5', '6', '7'],
flags=util.HTML
)
def test_avoid_implicit_language(self):
"""Test that we can narrow language selection to elements that match and explicitly state language."""
# Target element with language and language attribute
self.assert_selector(
self.MARKUP,
"p[lang]:lang(de-DE)",
['6'],
flags=util.HTML
)
def test_language_und(self):
"""Test that undefined language can be matched by `*`."""
markup = """
"""
self.assert_selector(
markup,
"div:lang('*')",
['2'],
flags=util.HTML
)
def test_language_empty_string(self):
"""Test that an empty string language will only match untagged languages `lang=""`."""
markup = """
"""
self.assert_selector(
markup,
"div:lang('')",
['1', '3', '4'],
flags=util.HTML
)
def test_language_list(self):
"""Test language list."""
# Multiple languages
markup = """
"""
self.assert_selector(
markup,
"p:lang(en)",
[],
flags=util.HTML
)
def test_language_in_header(self):
"""Test that we can find language in header."""
markup = """
"""
self.assert_selector(
markup,
"p:lang('*-US')",
['1', '2'],
flags=util.HTML
)
def test_xml_style_language_in_html5(self):
"""Test XML style language when out of HTML5 namespace."""
markup = """
"""
self.assert_selector(
markup,
"mtext:lang(en)",
['1'],
flags=util.HTML5
)
def test_xml_style_language(self):
"""Test XML style language."""
# XML style language
markup = """
"""
self.assert_selector(
markup,
"p:lang(de-DE)",
['1', '2', '3', '4', '5', '6'],
flags=util.XML
)
def test_language_in_xhtml_without_html_style_lang(self):
"""
Test language in XHTML.
HTML namespace elements must use HTML style language.
"""
# XHTML language: `lang`
markup = """
"""
self.assert_selector(
markup,
"p:lang(de-DE)",
[],
flags=util.XHTML
)
soupsieve-2.7/tests/test_level4/test_local_link.py0000644000000000000000000000133113615410400017436 0ustar00"""Test local link selectors."""
from .. import util
class TestLocalLink(util.TestCase):
"""Test local link selectors."""
MARKUP = """
Link
Another link
"""
def test_local_link(self):
"""Test local link (matches nothing)."""
self.assert_selector(
self.MARKUP,
"a:local-link",
[],
flags=util.HTML
)
def test_not_local_link(self):
"""Test not local link."""
self.assert_selector(
self.MARKUP,
"a:not(:local-link)",
["1", "2"],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_matches.py0000644000000000000000000000140713615410400016757 0ustar00"""Test matches selectors."""
from .. import util
class TestMatches(util.TestCase):
"""Test matches selectors."""
MARKUP = """
"""
def test_scope_is_root(self):
"""Test scope is the root when the a specific element is not the target of the select call."""
# Scope is root when applied to a document node
self.assert_selector(
self.MARKUP,
":scope",
["root"],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
":scope > body > div",
["div"],
flags=util.HTML
)
def test_scope_cannot_select_target(self):
"""Test that scope, the element which scope is called on, cannot be selected."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope is the element we are applying the select to, and that element is never returned
self.assertTrue(len(sv.select(':scope', el, flags=sv.DEBUG)) == 0)
def test_scope_is_select_target(self):
"""Test that scope is the element which scope is called on."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope here means the current element under select
ids = [el.attrs['id'] for el in sv.select(':scope div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
el = soup.body
ids = [el.attrs['id'] for el in sv.select(':scope div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
# `div` is the current element under select, and it has no `div` elements.
el = soup.div
ids = [el.attrs['id'] for el in sv.select(':scope div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted([]))
# `div` does have an element with the class `.wordshere`
ids = [el.attrs['id'] for el in sv.select(':scope .wordshere', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['pre']))
soupsieve-2.7/tests/test_level4/test_target_within.py0000644000000000000000000000147213615410400020205 0ustar00"""Test target within selectors."""
from .. import util
class TestTargetWithin(util.TestCase):
"""Test target within selectors."""
MARKUP = """
Jump
Header 1
content
Header 2
content
"""
def test_target_within(self):
"""Test target within."""
self.assert_selector(
self.MARKUP,
"article:target-within",
[],
flags=util.HTML
)
def test_not_target_within(self):
"""Test inverse of target within."""
self.assert_selector(
self.MARKUP,
"article:not(:target-within)",
["article"],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_user_invalid.py0000644000000000000000000000113413615410400020014 0ustar00"""Test invalid selectors."""
from .. import util
class TestInvalid(util.TestCase):
"""Test invalid selectors."""
def test_user_invalid(self):
"""Test user invalid (matches nothing)."""
markup = """
"""
self.assert_selector(
markup,
"input:user-invalid",
[],
flags=util.HTML
)
self.assert_selector(
markup,
"input:not(:user-invalid)",
["1"],
flags=util.HTML
)
soupsieve-2.7/tests/test_level4/test_where.py0000644000000000000000000000136313615410400016446 0ustar00"""Test where selectors."""
from .. import util
class TestWhere(util.TestCase):
"""Test where selectors."""
MARKUP = """
"""
def test_amp_is_root(self):
"""Test ampersand is the root when the a specific element is not the target of the select call."""
# Scope is root when applied to a document node
self.assert_selector(
self.MARKUP,
"&",
["root"],
flags=util.HTML
)
self.assert_selector(
self.MARKUP,
"& > body > div",
["div"],
flags=util.HTML
)
def test_amp_cannot_select_target(self):
"""Test that ampersand, the element which scope is called on, cannot be selected."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope is the element we are applying the select to, and that element is never returned
self.assertTrue(len(sv.select('&', el, flags=sv.DEBUG)) == 0)
def test_amp_is_select_target(self):
"""Test that ampersand is the element which scope is called on."""
for parser in util.available_parsers(
'html.parser', 'lxml', 'html5lib', 'xml'):
soup = self.soup(self.MARKUP, parser)
el = soup.html
# Scope here means the current element under select
ids = [el.attrs['id'] for el in sv.select('& div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
el = soup.body
ids = [el.attrs['id'] for el in sv.select('& div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['div']))
# `div` is the current element under select, and it has no `div` elements.
el = soup.div
ids = [el.attrs['id'] for el in sv.select('& div', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted([]))
# `div` does have an element with the class `.wordshere`
ids = [el.attrs['id'] for el in sv.select('& .wordshere', el, flags=sv.DEBUG)]
self.assertEqual(sorted(ids), sorted(['pre']))
soupsieve-2.7/.gitignore0000644000000000000000000000247013615410400012317 0ustar00.DS_Store
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
db.sqlite3
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# Jupyter Notebook
.ipynb_checkpoints
# IPython
profile_default/
ipython_config.py
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# SageMath parsed files
*.sage.py
# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
# Spyder project settings
.spyderproject
.spyproject
# Rope project settings
.ropeproject
# mkdocs documentation
/site
# mypy
.mypy_cache/
.dmypy.json
dmypy.json
# Pyre type checker
.pyre/
# Patches
*.patch
soupsieve-2.7/LICENSE.md0000644000000000000000000000211013615410400011722 0ustar00MIT License
Copyright (c) 2018 - 2025 Isaac Muse
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
soupsieve-2.7/README.md0000644000000000000000000000664413615410400011615 0ustar00[![Donate via PayPal][donate-image]][donate-link]
[![Build][github-ci-image]][github-ci-link]
[![Coverage Status][codecov-image]][codecov-link]
[![PyPI Version][pypi-image]][pypi-link]
[![PyPI Downloads][pypi-down]][pypi-link]
[![PyPI - Python Version][python-image]][pypi-link]
[![License][license-image-mit]][license-link]
# Soup Sieve
## Overview
Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting,
matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1
specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup
version 4.7.0, it now is :confetti_ball:. Soup Sieve can also be imported in order to use its API directly for
more controlled, specialized parsing.
Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a
number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply
do not match anything. Some of the supported selectors are:
- `.classes`
- `#ids`
- `[attributes=value]`
- `parent child`
- `parent > child`
- `sibling ~ sibling`
- `sibling + sibling`
- `:not(element.class, element2.class)`
- `:is(element.class, element2.class)`
- `parent:has(> child)`
- and [many more](https://facelessuser.github.io/soupsieve/selectors/)
## Installation
You must have Beautiful Soup already installed:
```
pip install beautifulsoup4
```
In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via
some alternative method, and Soup Sieve is not automatically installed, you can install it directly:
```
pip install soupsieve
```
If you want to manually install it from source, first ensure that [`build`](https://pypi.org/project/build/) is
installed:
```
pip install build
```
Then navigate to the root of the project and build the wheel and install (replacing `` with the current version):
```
python -m build -w
pip install dist/soupsieve--py3-none-any.whl
```
## Documentation
Documentation is found here: https://facelessuser.github.io/soupsieve/.
## License
MIT
[bs4]: https://beautiful-soup-4.readthedocs.io/en/latest/#
[github-ci-image]: https://github.com/facelessuser/soupsieve/workflows/build/badge.svg
[github-ci-link]: https://github.com/facelessuser/soupsieve/actions?query=workflow%3Abuild+branch%3Amain
[codecov-image]: https://img.shields.io/codecov/c/github/facelessuser/soupsieve/master.svg?logo=codecov&logoColor=aaaaaa&labelColor=333333
[codecov-link]: https://codecov.io/github/facelessuser/soupsieve
[pypi-image]: https://img.shields.io/pypi/v/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-down]: https://img.shields.io/pypi/dm/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-link]: https://pypi.python.org/pypi/soupsieve
[python-image]: https://img.shields.io/pypi/pyversions/soupsieve?logo=python&logoColor=aaaaaa&labelColor=333333
[license-image-mit]: https://img.shields.io/badge/license-MIT-blue.svg?labelColor=333333
[license-link]: https://github.com/facelessuser/soupsieve/blob/main/LICENSE.md
[donate-image]: https://img.shields.io/badge/Donate-PayPal-3fabd1?logo=paypal
[donate-link]: https://www.paypal.me/facelessuser
soupsieve-2.7/hatch_build.py0000644000000000000000000000304613615410400013147 0ustar00"""Dynamically define some metadata."""
import os
from hatchling.metadata.plugin.interface import MetadataHookInterface
def get_version_dev_status(root):
"""Get version_info without importing the entire module."""
import importlib.util
path = os.path.join(root, "soupsieve", "__meta__.py")
spec = importlib.util.spec_from_file_location("__meta__", path)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module.__version_info__._get_dev_status()
class CustomMetadataHook(MetadataHookInterface):
"""Our metadata hook."""
def update(self, metadata):
"""See https://ofek.dev/hatch/latest/plugins/metadata-hook/ for more information."""
metadata["classifiers"] = [
f"Development Status :: {get_version_dev_status(self.root)}",
'Environment :: Console',
'Intended Audience :: Developers',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
'Programming Language :: Python :: 3.10',
'Programming Language :: Python :: 3.11',
'Programming Language :: Python :: 3.12',
'Programming Language :: Python :: 3.13',
'Topic :: Internet :: WWW/HTTP :: Dynamic Content',
'Topic :: Software Development :: Libraries :: Python Modules',
'Typing :: Typed'
]
soupsieve-2.7/pyproject.toml0000644000000000000000000000545113615410400013245 0ustar00[build-system]
requires = [
"hatchling>=0.21.1",
]
build-backend = "hatchling.build"
[project]
name = "soupsieve"
description = "A modern CSS selector implementation for Beautiful Soup."
readme = "README.md"
license = "MIT"
requires-python = ">=3.8"
authors = [
{ name = "Isaac Muse", email = "Isaac.Muse@gmail.com" },
]
keywords = [
"CSS",
"HTML",
"XML",
"selector",
"filter",
"query",
"soup"
]
dynamic = [
"classifiers",
"version",
]
[project.urls]
Homepage = "https://github.com/facelessuser/soupsieve"
[tool.hatch.version]
source = "code"
path = "soupsieve/__meta__.py"
[tool.hatch.build.targets.wheel]
include = [
"/soupsieve",
]
[tool.hatch.build.targets.sdist]
include = [
"/docs/src/markdown/**/*.md",
"/docs/src/markdown/**/*.gif",
"/docs/src/markdown/**/*.png",
"/docs/src/markdown/dictionary/*.txt",
"/docs/theme/**/*.css",
"/docs/theme/**/*.js",
"/docs/theme/**/*.html",
"/requirements/*.txt",
"/soupsieve/**/*.py",
"/soupsieve/py.typed",
"/tests/**/*.py",
"/.pyspelling.yml",
"/.coveragerc",
"/mkdocs.yml"
]
[tool.mypy]
files = [
"soupsieve"
]
strict = true
show_error_codes = true
[tool.hatch.metadata.hooks.custom]
[tool.ruff]
line-length = 120
lint.select = [
"A", # flake8-builtins
"B", # flake8-bugbear
"D", # pydocstyle
"C4", # flake8-comprehensions
"N", # pep8-naming
"E", # pycodestyle
"F", # pyflakes
"PGH", # pygrep-hooks
"RUF", # ruff
# "UP", # pyupgrade
"W", # pycodestyle
"YTT", # flake8-2020,
"PERF" # Perflint
]
lint.ignore = [
"E741",
"D202",
"D401",
"D212",
"D203",
"N802",
"N801",
"N803",
"N806",
"N818",
"RUF012",
"RUF005",
"PGH004",
"RUF100",
"RUF022",
"RUF023"
]
[tool.tox]
legacy_tox_ini = """
[tox]
isolated_build = true
envlist =
py{38,39,310,311,312},
lint, nolxml, nohtml5lib
[testenv]
passenv = *
deps =
-rrequirements/tests.txt
commands =
mypy
pytest --cov soupsieve --cov-append {toxinidir}
coverage html -d {envtmpdir}/coverage
coverage xml
coverage report --show-missing
[testenv:documents]
passenv = *
deps =
-rrequirements/docs.txt
commands =
mkdocs build --clean --verbose --strict
pyspelling -j 8
[testenv:lint]
passenv = *
deps =
-rrequirements/lint.txt
commands =
"{envbindir}"/ruff check .
[testenv:nolxml]
passenv = *
deps =
-rrequirements/tests-nolxml.txt
commands =
pytest {toxinidir}
[testenv:nohtml5lib]
passenv = *
deps =
-rrequirements/tests-nohtml5lib.txt
commands =
pytest {toxinidir}
[pytest]
filterwarnings =
ignore:\nCSS selector pattern:UserWarning
"""
[tool.pytest.ini_options]
filterwarnings = [
"ignore:The 'strip_cdata':DeprecationWarning"
]
soupsieve-2.7/PKG-INFO0000644000000000000000000001103013615410400011414 0ustar00Metadata-Version: 2.4
Name: soupsieve
Version: 2.7
Summary: A modern CSS selector implementation for Beautiful Soup.
Project-URL: Homepage, https://github.com/facelessuser/soupsieve
Author-email: Isaac Muse
License-Expression: MIT
License-File: LICENSE.md
Keywords: CSS,HTML,XML,filter,query,selector,soup
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Dynamic Content
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.8
Description-Content-Type: text/markdown
[![Donate via PayPal][donate-image]][donate-link]
[![Build][github-ci-image]][github-ci-link]
[![Coverage Status][codecov-image]][codecov-link]
[![PyPI Version][pypi-image]][pypi-link]
[![PyPI Downloads][pypi-down]][pypi-link]
[![PyPI - Python Version][python-image]][pypi-link]
[![License][license-image-mit]][license-link]
# Soup Sieve
## Overview
Soup Sieve is a CSS selector library designed to be used with [Beautiful Soup 4][bs4]. It aims to provide selecting,
matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1
specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).
Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup
version 4.7.0, it now is :confetti_ball:. Soup Sieve can also be imported in order to use its API directly for
more controlled, specialized parsing.
Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a
number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply
do not match anything. Some of the supported selectors are:
- `.classes`
- `#ids`
- `[attributes=value]`
- `parent child`
- `parent > child`
- `sibling ~ sibling`
- `sibling + sibling`
- `:not(element.class, element2.class)`
- `:is(element.class, element2.class)`
- `parent:has(> child)`
- and [many more](https://facelessuser.github.io/soupsieve/selectors/)
## Installation
You must have Beautiful Soup already installed:
```
pip install beautifulsoup4
```
In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via
some alternative method, and Soup Sieve is not automatically installed, you can install it directly:
```
pip install soupsieve
```
If you want to manually install it from source, first ensure that [`build`](https://pypi.org/project/build/) is
installed:
```
pip install build
```
Then navigate to the root of the project and build the wheel and install (replacing `` with the current version):
```
python -m build -w
pip install dist/soupsieve--py3-none-any.whl
```
## Documentation
Documentation is found here: https://facelessuser.github.io/soupsieve/.
## License
MIT
[bs4]: https://beautiful-soup-4.readthedocs.io/en/latest/#
[github-ci-image]: https://github.com/facelessuser/soupsieve/workflows/build/badge.svg
[github-ci-link]: https://github.com/facelessuser/soupsieve/actions?query=workflow%3Abuild+branch%3Amain
[codecov-image]: https://img.shields.io/codecov/c/github/facelessuser/soupsieve/master.svg?logo=codecov&logoColor=aaaaaa&labelColor=333333
[codecov-link]: https://codecov.io/github/facelessuser/soupsieve
[pypi-image]: https://img.shields.io/pypi/v/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-down]: https://img.shields.io/pypi/dm/soupsieve.svg?logo=pypi&logoColor=aaaaaa&labelColor=333333
[pypi-link]: https://pypi.python.org/pypi/soupsieve
[python-image]: https://img.shields.io/pypi/pyversions/soupsieve?logo=python&logoColor=aaaaaa&labelColor=333333
[license-image-mit]: https://img.shields.io/badge/license-MIT-blue.svg?labelColor=333333
[license-link]: https://github.com/facelessuser/soupsieve/blob/main/LICENSE.md
[donate-image]: https://img.shields.io/badge/Donate-PayPal-3fabd1?logo=paypal
[donate-link]: https://www.paypal.me/facelessuser