Composite Selectors

Composite selectors allow you to combine multiple selectors into one for more refined search criteria.
Various ways of combining selectors are described in this tutorial.

Combinators

Inspired by CSS, Combinators in soupsavvy allow you to define relationship between multiple selectors.
For more information on CSS combinators, you can refer to Mozilla.

Operators

Combinators can be created using operators as a more concisely alternative. Each combinator has a corresponding operator that defines relationship between two selectors.

Combinator(left, right) == left {operator} right

For example, the >> operator can be used as a shorthand for DescendantCombinator:

DescendantCombinator(left, right) == left >> right

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $30</p>
        <div class="book">
            <span class="title">Animal Farm</span>
            <span class="price_section">
                <p class="price">Price: $20</p>
            </span>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = ClassSelector("book") >> ClassSelector("price")
selector.find(element)

SoupElement(<p class="price">Price: $20</p>)

Multiple Selectors

Combinators allow you to chain any number of selectors, when they are passed as positional arguments.

Combinator(first, second, third)

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, DescendantCombinator, IdSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $30</p>
        <div class="book">
            <span class="title">Animal Farm</span>
            <p class="price">Price: $10</p>
        </div>
        <div id="available">
            <div class="book">
                <span class="title">Animal Farm</span>
                <span class="price_section">
                    <p class="price">Price: $20</p>
                </span>
            </div>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = DescendantCombinator(
    IdSelector("available"),
    ClassSelector("book"),
    ClassSelector("price"),
)
selector.find(element)

SoupElement(<p class="price">Price: $20</p>)

Combinators Equality

Two combinators are considered equal only if they are of the same type and contain the exact same selectors in the same order. The order of selectors is significant:

left {operator} right != right {operator} left

from soupsavvy import ClassSelector

book_selector = ClassSelector("book")
price_selector = ClassSelector("price")

print(
    "left >> right == right >> left:",
    price_selector >> book_selector == book_selector >> price_selector,
)

left >> right == right >> left: False

Non-Recursive

For combinators, setting recursive=False ensures that elements are returned only if the element matched by the first selector is a direct child of the searched element.

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $30</p>
        <span class="not_child_book">
            <div class="book">
                <span class="title">Animal Farm</span>
                <span class="price_section">
                    <p class="price">Price: $50</p>
                </span>
            </div>
        </span>
        <div class="book">
            <span class="title">Animal Farm</span>
            <span class="price_section">
                <p class="price">Price: $20</p>
            </span>
        </div>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)

selector = ClassSelector("book") >> ClassSelector("price")
selector.find(element, recursive=False)

SoupElement(<p class="price">Price: $20</p>)

Combining Combinators

Combinators can be combined to replicate complex CSS relationships, like:

#available > div .price

This is achieved using ChildCombinator and DescendantCombinator together.

Precedence Caveats

Note, that some operators have higher precedence than others, which can affect the order in which expressions are evaluated.

left > middle >> right

The >> (DescendantCombinator) takes precedence over > (ChildCombinator), resulting in:

ChildCombinator(left, DescendantCombinator(middle, right))

Use parentheses to adjust precedence as needed.

Combining Combinators

Combinators can be combined to replicate complex CSS relationships, like:

CSS Example:

#available > div .price

This is achieved with using ChildCombinator and DescendantCombinator together.

Caveat:

Note, that some operators have higher precedence than others, which can affect the order in which expressions are evaluated.

left > middle >> right

The >> operator takes precedence over >, resulting in:

ChildCombinator(left, DescendantCombinator(middle, right))

Use parentheses to adjust the expression as needed.

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, IdSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $30</p>
        <div>
            <span class="title">Animal Farm</span>
            <p class="price">Price: $10</p>
        </div>
        <div id="available">
            <div>
                <span class="title">Animal Farm</span>
                <span class="discount">
                    <h2>Discounted</h2>
                    <p class="price">Price: $15</p>
                </span>
                <p class="price">Price: $20</p>
            </div>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = (IdSelector("available") > TypeSelector("div")) >> ClassSelector("price")
selector.find(element)

SoupElement(<p class="price">Price: $15</p>)

DescendantCombinator

The Descendant Combinator is one of the simplest and most frequently used combinators in CSS. It selects elements that match a second selector only if they have an ancestor that matches the first selector. In CSS, this relationship is represented by a single space " " between two selectors. For example, the following CSS:

.book .price

matches all tags with the class price that are descendants of tags with the class book. For more details on CSS combinators, refer to the Mozilla.

DescendantCombinator

Defines the relationship between two selectors, where the second selector matches the descendant of the element matched by the first selector.

CSS Example:

.book .price

Operator: >>

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $30</p>
        <div class="book">
            <span class="title">Animal Farm</span>
            <span class="price_section">
                <p class="price">Price: $20</p>
            </span>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = ClassSelector("book") >> ClassSelector("price")
selector.find(element)

SoupElement(<p class="price">Price: $20</p>)

ChildCombinator

Defines the relationship between two selectors, where the second selector matches only the direct children of the element matched by the first selector.

CSS Example:

div > .price

Operator: >

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $30</p>
        <div>
            <span class="title">Animal Farm</span>
            <span class="discount">
                <h2>Discounted</h2>
                <p class="price">Price: $15</p>
            </span>
            <p class="price">Price: $20</p>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = TypeSelector("div") > ClassSelector("price")
selector.find(element)

SoupElement(<p class="price">Price: $20</p>)

NextSiblingCombinator

Defines the relationship between two selectors, where the second selector matches the immediate sibling that directly follows the element matched by the first selector.

CSS Example:

div + .price

Operator: +

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, PatternSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <h2>Discounted</h2>
        <span>Unavailable</span>
        <p class="price">Price: $10</p>
        <h1>Discounted</h1>
        <p class="price">Price: $20</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = PatternSelector("Discounted") + ClassSelector("price")
selector.find(element)

SoupElement(<p class="price">Price: $20</p>)

SubsequentSiblingCombinator

Defines the relationship between two selectors, where the second selector matches all siblings that follow the element matched by the first selector.

CSS Example:

div ~ .price

Operator: ~

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $25</p>
        <h2>Discounted</h2>
        <span>Bargain!!!</span>
        <p class="price">Price: $15</p>
        <p class="price">Price: $10</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = TypeSelector("h2") * ClassSelector("price")
selector.find_all(element)

[SoupElement(<p class="price">Price: $15</p>),
 SoupElement(<p class="price">Price: $10</p>)]

ParentCombinator

Defines the relationship between two selectors, where the second selector matches all instances of the element that is a parent of the first matched element.

CSS Example:

.discount:has(> p)

Operator: <

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $15</p>
        <span class="book">
            <p class="price">Price: $25</p>
        </span>
        <span class="discount"></span>
        <span class="discount">
            <div>
                <p class="price">Price: $35</p>
            </div>
        </span>
        <span class="discount">
            <p class="price">Price: $10</p>
        </span>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = TypeSelector("p") < ClassSelector("discount")
result = selector.find(element)
print(result)

<span class="discount">
<p class="price">Price: $10</p>
</span>

AncestorCombinator

Defines the relationship between two selectors, where the second selector matches all instances of the element that is an ancestor of the first matched element.

CSS Example:

.discount:has(p)

Operator: <<

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price">Price: $15</p>
        <span class="book">
            <p class="price">Price: $25</p>
        </span>
        <span class="discount"></span>
        <span class="discount">
            <div><p class="price">Price: $35</p></div>
        </span>
        <span class="discount">
            <p class="price">Price: $10</p>
        </span>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = TypeSelector("p") << ClassSelector("discount")
print("\n\n".join(str(element) for element in selector.find_all(element)))

<span class="discount">
<div><p class="price">Price: $35</p></div>
</span>

<span class="discount">
<p class="price">Price: $10</p>
</span>

Logical Selectors

These selectors allow you to create new selectors by combining multiple selectors using logical operators such as AND, OR, NOT, and XOR.

Equality

Logical selectors, unlike combinators, are commutative. This means that the order of selectors within a logical selector does not affect the result.

first & second == second & first

from soupsavvy import ClassSelector

discount_selector = ClassSelector("discount")
price_selector = ClassSelector("price")

print(
    "left & right == right & left:",
    discount_selector & price_selector == price_selector & discount_selector,
)

left & right == right & left: True

Additionally, two instances can be considered equal even if they contain a different number of selectors, as long as they represent the same criteria.

from soupsavvy import AttributeSelector, ClassSelector, SelectorList

discount_selector = ClassSelector("discount")
price_selector = ClassSelector("price")
another_price_selector = AttributeSelector("class", value="price")

print(
    SelectorList(discount_selector, price_selector)
    == SelectorList(discount_selector, price_selector, another_price_selector)
)

True

AndSelector

AndSelector corresponds to the CSS compound selector, which is a concatenation of multiple selectors. It selects elements that match all of the specified selectors.

CSS Example:

p.price

Operator: &

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="title">Animal Farm</p>
        <span class="price">Price: $30</p>
        <p class="price">Price: $20</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = TypeSelector("p") & ClassSelector("price")
print(selector.find(element))

<p class="price">Price: $20</p>

SelectorList

SelectorList corresponds to the CSS selector list. It selects elements that match any of the specified selectors.

CSS Example:

h1, h2

Operator: |

Aliases: OrSelector

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <h1>Hello World</h1>
        <span>Extra information</span>
        <h2>Goodbye World</h2>
        <h3>Not interested</h3>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = TypeSelector("h1") | TypeSelector("h2")
selector.find_all(element)

[SoupElement(<h1>Hello World</h1>), SoupElement(<h2>Goodbye World</h2>)]

NotSelector

NotSelector corresponds to the CSS :not() pseudo-class, which excludes elements that match a specified selector. It allows you to select elements that do not meet certain criteria.

CSS Example:

:not(.discount)

Operator: ~

NotSelector(selector) == ~selector

Reference: Mozilla

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price discount">Price: €10</p>
        <p class="price">Price: $20</p>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)

selector = ~ClassSelector("discount")
selector.find(element)

SoupElement(<p class="price">Price: $20</p>)

Multiple Selectors

When more then one selector is passed to NotSelector, it selects elements, that do not match any of them. Alternatively SelectorList can be used to negate multiple selectors:

NotSelector(left, right) == ~(left | right)

import re

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, NotSelector, PatternSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="price discount">Price: €10</p>
        <p class="price">Price: $20</p>
        <p class="price">Price: €15</p>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)

discount_selector = ClassSelector("discount")
dollars_selector = PatternSelector(re.compile(r"\$\d+"))
selector = NotSelector(discount_selector, dollars_selector)
selector.find(element)

SoupElement(<p class="price">Price: €15</p>)

XORSelector

XORSelector corresponds to the logical XOR operation on selectors, selecting elements that match exactly one of the provided selectors.

CSS Equivalent:
While CSS does not have a direct counterpart, you can achieve similar results using selector list with :not() pseudo-class:

span:not(.discount), .discount:not(span)

Operator: ^

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, TypeSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="discount">Buy!</span>
        <p class="price">Price: $10</p>
        <span class="price">Price: $20</span>
        <p class="discount">Price: $30</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = ClassSelector("discount") ^ TypeSelector("span")
selector.find_all(element)

[SoupElement(<span class="price">Price: $20</span>),
 SoupElement(<p class="discount">Price: $30</p>)]

Relative Selectors

Relative selector apart from selection criteria defines relationship with anchor element.

According to Mozilla - relative selectors represent elements in relation to anchor element(s), typically introduced by a combinator.

CSS Example:

.discount:has(> p)

In this case, an element of type p is in a child-parent relationship with the element of class discount (the anchor).

In the context of soupsavvy:

Anchor Element: The bs4 object being searched.
Relative Element: Any element that matches the selector and maintains a specified relationship with the anchor element.

Anchor

Alternative way of creating relative selectors is using Anchor object and specific operator, that match those used in combinators.

>> -> RelativeDescendant
> -> RelativeChild
<< -> RelativeAncestor
< -> RelativeParent
+ -> RelativeNextSibling
* -> RelativeSubsequentSibling

from bs4 import BeautifulSoup

from soupsavvy import Anchor, ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <div>
            <span><p class="price">Price: $10</p></span>
            <p class="price">Price: $20</p>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
div = element.find_all("div")[0]

selector = Anchor > ClassSelector("price")
selector.find(div)

SoupElement(<p class="price">Price: $20</p>)

Relative Siblings

RelativeNextSibling and RelativeSubsequentSibling are used to select following siblings of the anchor element (element passed to find methods).

from bs4 import BeautifulSoup

from soupsavvy import Anchor, ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="title">Animal Farm</p>
        <div class="section">Book 1</div>
        <p class="price">Price: $30</p>
        <p class="discount">Price: $20</p>
        <p class="price">Price: $10</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
div = element.find_all("div")[0]

selector = Anchor + ClassSelector("price")
selector.find_all(div)

[SoupElement(<p class="price">Price: $30</p>)]

Recursivity

Relative selectors in soupsavvy are not affected by the recursive parameter. They have their own independent behavior determined by relationship between anchor and relative elements. Below, RelativeDescendant relationship takes precedence over recursive parameter.

from bs4 import BeautifulSoup

from soupsavvy import Anchor, ClassSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <div>
            <span><p class="price">Price: $10</p></span>
            <p class="price">Price: $20</p>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)
div = element.find_all("div")[0]

selector = Anchor >> ClassSelector("price")
selector.find(div, recursive=False)

SoupElement(<p class="price">Price: $10</p>)

HasSelector

The HasSelector is a counterpart of CSS :has() pseudo-class.
According to Mozilla, this pseudo-class matches an element if any relative selectors passed as arguments match at least one element.

CSS Example:

:has(> .price)

This selector matches any element that has a direct child with the class price.

Default Recursive Search

The default combinator for HasSelector is the descendant relationship. This means any selector passed to HasSelector that is not a relative selector will be treated as a relative descendant selector. As a result, if the parent of a matched element is selected, its ancestors will also be included in the selection.

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, HasSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <div class="book">
        <span class="title">Brave New World</span>
        <p class="price">Price: $20</p>
    </div>
    <div class="book">
        <span class="title">Animal Farm</span>
        <span>
            <p class="price discount">Price: $15</p>
        </span>
        <p class="price">Price: $20</p>
    </div>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)

selector = HasSelector(ClassSelector("discount"))
print("\n\n".join(str(element) for element in selector.find_all(element)))

<div class="book">
<span class="title">Animal Farm</span>
<span>
<p class="price discount">Price: $15</p>
</span>
<p class="price">Price: $20</p>
</div>

<span>
<p class="price discount">Price: $15</p>
</span>

Siblings Search

RelativeNextSibling and RelativeSubsequentSibling can be used to select element with next/subsequent sibling that matches the selector.

from bs4 import BeautifulSoup

from soupsavvy import Anchor, ClassSelector, HasSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <p class="title">Animal Farm</p>
        <span>Hello World</span>
        <div class="section">Brave New World</div>
        <p class="price">Price: $30</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = HasSelector(Anchor + ClassSelector("price"))
selector.find(element)

SoupElement(<div class="section">Brave New World</div>)

Ancestors Search

Combining RelativeAncestor and RelativeParent selectors with HasSelector allows you to find elements that have a specific ancestor or parent. For instance, you can locate all elements that have an ancestor with class breaking. Matched ancestor does not have to be descendant of bs4 object.

from bs4 import BeautifulSoup

from soupsavvy import Anchor, ClassSelector, HasSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <div class="breaking">
            <span>
                <span class="info">Important!</span>
                <span>Actual Information</span>
            </span>
        </div>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

span = element.find_all("span")[0]

selector = HasSelector(Anchor << ClassSelector("breaking"))
selector.find_all(span)

[SoupElement(<span class="info">Important!</span>),
 SoupElement(<span>Actual Information</span>)]

Nth Selectors

Selectors, that allow you to select the nth element that matches a specific selector.

While CSS offers pseudo-classes like nth-child and nth-of-type to select elements based on their ordinal position among siblings, these selectors only apply to the overall order of siblings.

For example, selecting every 2nd element with the class price in CSS is not feasible because:

.price:nth-child(2n)

selects the 2nd child that has the class price, not every 2nd price element.

In soupsavvy, you can achieve this with:

NthOfSelector(ClassSelector('price'), nth="2n")

This selector selects every 2nd element with the class price.

NthOfSelector

The NthOfSelector enables you to select elements based on a specified occurrence pattern defined by the nth rule. The provided nth parameter must follow valid CSS syntax (<An+B>, even, or odd).

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, NthOfSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <p class="price discount">Price: €1</p>
        <p class="price">Price: $2</p>
        <span>Bestseller</span>
        <p class="price">Price: €3</p>
        <p class="price">Price: €4</p>
        <p class="price">Price: €5</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = NthOfSelector(ClassSelector("price"), nth="2n")
selector.find_all(element)

[SoupElement(<p class="price">Price: $2</p>),
 SoupElement(<p class="price">Price: €4</p>)]

NthLastOfSelector

The NthLastOfSelector functions similarly to the NthOfSelector, but it counts elements from the end of the list.

import re

from bs4 import BeautifulSoup

from soupsavvy import NthLastOfSelector, PatternSelector, to_soupsavvy

soup = BeautifulSoup(
    """
        <span class="title">Animal Farm</span>
        <p class="price discount">Price: €1</p>
        <p>Price: $2</p>
        <span>Bestseller</span>
        <p class="price">Price: €3</p>
        <p>Price: €4</p>
        <p class="price">Price: €5</p>
    """,
    features="lxml",
)
element = to_soupsavvy(soup)

selector = NthLastOfSelector(
    PatternSelector(re.compile("^price", re.IGNORECASE)),
    nth="odd",
)
selector.find_all(element)

[SoupElement(<p class="price discount">Price: €1</p>),
 SoupElement(<p class="price">Price: €3</p>),
 SoupElement(<p class="price">Price: €5</p>)]

OnlyOfSelector

The OnlyOfSelector selects an element only if it is the sole matching element among its siblings.
If more than one element matches the specified selector, none will be selected.

from bs4 import BeautifulSoup

from soupsavvy import ClassSelector, OnlyOfSelector, to_soupsavvy

soup = BeautifulSoup(
    """
    <div class="book">
        <span class="title">Animal Farm</span>
        <p class="price">Price: $15</p>
        <p class="price">Price: $20</p>
    </div>
    <div class="book">
        <span class="title">Frankenstein</span>
        <p class="price">Price: $30</p>
    </div>
    """,
    features="html.parser",
)
element = to_soupsavvy(soup)

selector = OnlyOfSelector(ClassSelector("price"))
selector.find(element)

SoupElement(<p class="price">Price: $30</p>)

Operators module

As an alternative way to combine selectors, soupsavvy provides convenient operator functions in soupsavvy.operators module, offering shortcuts for composite selectors:

and_ -> AndSelector
or_ -> SelectorList
is_ -> SelectorList
where -> SelectorList
not_ -> NotSelector
has -> HasSelector
xor -> XORSelector

These functions can enhance clarity and conciseness in some context.

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.operators import and_

and_(ClassSelector("price"), TypeSelector("p"))

AndSelector(ClassSelector('price'), TypeSelector(name='p'))

from soupsavvy import ClassSelector, TypeSelector
from soupsavvy.operators import has

has(ClassSelector("price"), TypeSelector("p"))

HasSelector(ClassSelector('price'), TypeSelector(name='p'))

Conclusion

soupsavvy provides a wide range of composite selectors, that can be used to create more complex search criteria. Designed for flexibility and easy customization, these selectors allow you to tailor your selectors to meet specific needs.

Enjoy soupsavvy and leave us feedback!
Happy scraping!