selectors.general
Module with miscellaneous selectors.
Classes
TypeSelector - combines type and attribute selectors
PatternSelector - matches elements based on text content and selector
UniversalSelector - universal selector (*)
SelfSelector - matches the element itself
ExpressionSelector - matches elements based on user-defined function
- class TypeSelector(name: str)[source]
Bases:
SoupSelector,SelectableCSSSelector for finding elements based on tag name (type). Counterpart of css type selectors.
Example
>>> TypeSelector("div")
matches all elements that have “div” tag name.
Example
>>> <div class="widget">Hello World</div> ✔️ >>> <a href="/shop">Hello World</a> ❌
CSS counterpart can be represented as:
Example
>>> div
And can be retrieved with css property.
Example
>>> TypeSelector("div").css "div"
Parameters
- namestr
Tag name of the element ex. “a”, “div”.
Notes
For more information about type selectors, see:
https://developer.mozilla.org/en-US/docs/Web/CSS/Type_selectors
- name: str
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.
- property css: str
Returns string representing element css selector.
- __init__(name: str) None
- class PatternSelector(pattern: Pattern[str] | str)[source]
Bases:
SoupSelectorSelector for finding elements based on text content pattern.
Example
>>> PatternSelector("Hello World")
matches all element with exact text content “Hello World”.
Example
>>> <div>Hello World</div> ✔️ >>> <div>Hello Python</div> ❌ >>> <div>Hello World 3</div> ❌
In case of using regex pattern, re.search is used to match the attribute value.
Example
>>> PatternSelector(re.compile(r"[0-9]+"))
matches all elements with text content containing at least one digit.
Example
>>> <div>Hello World 123</div> ✔️ >>> <div>Hello World</div> ❌
Parameters
- pattern: str | Pattern
Pattern to match text of the element. Can be a string for exact match or Pattern for any more complex regular expressions.
Notes
Element does not match the pattern if it has any children. Only leaf nodes can be returned by PatternSelector find methods.
- pattern: Pattern[str] | str
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.
- __init__(pattern: Pattern[str] | str) None
- class UniversalSelector[source]
Bases:
SoupSelector,SelectableCSSSelector representing a wildcard pattern, that matches all elements in the html page.
Example
>>> UniversalSelector()
CSS counterpart can be represented as:
Example
>>> *
And can be retrieved with css property.
Example
>>> UniversalSelector().css "*"
Notes
For more information on universal selector, see:
https://developer.mozilla.org/en-US/docs/Web/CSS/Universal_selectors
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.
- property css: str
Returns wildcard css selector matching all elements in the markup.
- __init__() None
- class AnyTagSelector(**kwargs)[source]
Bases:
UniversalSelectorAlias for UniversalSelector class. Deprecated component.
- class SelfSelector[source]
Bases:
SoupSelectorSelector matching only the element itself. Convenience component that can be used for compatibility.
Example
>>> SelfSelector()
always matches the tag that is passed to the find methods.
Notes
Can be used in user-defined model for scope if element itself is the scope.
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.
- class ExpressionSelector(f: Callable[[IElement], bool])[source]
Bases:
SoupSelectorSelector that matches elements based on a user-defined function (predicate), that is used as filter for element object.
Applies predicate to each element and returns those that satisfy the condition.
Parameters
- fCallable[[IElement], bool]
A user-defined function (predicate) that determines whether the element should be selected.
Examples
>>> selector = ExpressionSelector(lambda x: x.name not in {"a", "div"}) ... selector.find(soup)
To perform operations on underlying node, use IElement.get() method or IElement.node attribute.
Example
>>> selector = ExpressionSelector(lambda x: 'widget' in x.node['class'])
For SoupElement object, that wraps bs4.Tag.
Notes
Any exceptions should be handled inside provided function. If raised, it will be propagated to the caller.
- find_all(tag: IElement, recursive: bool = True, limit: int | None = None) list[IElement][source]
Finds all elements matching selector in provided IElement.
Parameters
- tagIElement
Any IElement object to search within.
- recursivebool, optional
Specifies if search should be recursive. If set to False, only direct children of the element will be searched. By default True.
- limitint, optional
Specifies maximum number of elements to return. By default None, all found elements are returned.
Returns
- list[IElement]
List of IElement objects matching selector. If none found, the list is empty.