{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# CSS Selectors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `soupsavvy.selectors.css` subpackage provides a set of CSS-based selectors, built as wrappers around the [`soupsieve`](https://github.com/facelessuser/soupsieve) library — *'a modern CSS selector implementation for BeautifulSoup'*. These selectors can be seamlessly combined with other `soupsavvy` selectors, allowing for flexible use of pure CSS and common [`pseudo-classes`](https://developer.mozilla.org/en-US/docs/Web/CSS/Pseudo-classes)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Child Selectors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Child selectors target elements based on their position among siblings within a parent element. While `nth-child` can handle any position-based selection, `soupsavvy` offers convenient wrappers for several frequently used CSS pseudo-classes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### FirstChild\n", "\n", "The `FirstChild` selector selects every element that is the first child of its parent.\n", "\n", "```css\n", ":first-child\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy.selectors.css import FirstChild\n", "from soupsavvy import to_soupsavvy\n", "\n", "soup = BeautifulSoup(\n", " \"\"\"\n", "
First
\n", "1
\n", "2
\n", "3
\n", "4
\n", "5
\n", "6
\n", " \"\"\",\n", " features=\"html.parser\",\n", ")\n", "element = to_soupsavvy(soup)\n", "\n", "selector = NthChild(\"2n\")\n", "selector.find_all(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NthLastChild\n", "\n", "The `NthLastChild` selector allows you to select elements based on their position among their siblings, counting from the last child of the parent element.\n", "\n", "```css\n", ":nth-last-child(3)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy.selectors.css import NthLastChild\n", "from soupsavvy import to_soupsavvy\n", "\n", "soup = BeautifulSoup(\n", " \"\"\"\n", "1
\n", "2
\n", "3
\n", "4
\n", "5
\n", "6
\n", " \"\"\",\n", " features=\"html.parser\",\n", ")\n", "element = to_soupsavvy(soup)\n", "\n", "selector = NthLastChild(\"odd\")\n", "selector.find_all(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### OnlyChild\n", "\n", "The `OnlyChild` selector matches elements that are the only child of their parent.\n", "\n", "```css\n", ":only-child\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy.selectors.css import OnlyChild\n", "from soupsavvy import to_soupsavvy\n", "\n", "soup = BeautifulSoup(\n", " \"\"\"\n", "Text
\n", "Only child
First p
\n", "Last p
\n", "1
\n", " 1\n", "2
\n", " 2\n", "3
\n", " 3\n", "4
\n", " 4\n", " \"\"\",\n", " features=\"html.parser\",\n", ")\n", "element = to_soupsavvy(soup)\n", "\n", "selector = NthOfType(\"2n+2\")\n", "selector.find_all(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NthLastOfType\n", "\n", "Selects every element that is the nth child of the type, counting from the last child.\n", "\n", "```css\n", ":nth-last-of-type(n)\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy.selectors.css import NthLastOfType\n", "from soupsavvy import to_soupsavvy\n", "\n", "soup = BeautifulSoup(\n", " \"\"\"\n", "1
\n", " 1\n", "2
\n", " 2\n", "3
\n", " 3\n", "4
\n", " 4\n", " \"\"\",\n", " features=\"html.parser\",\n", ")\n", "element = to_soupsavvy(soup)\n", "\n", "selector = NthLastOfType(\"-n+2\")\n", "selector.find_all(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### OnlyOfType\n", "\n", "Selects every element that is the only child of the type.\n", "\n", "```css\n", ":only-of-type\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy.selectors.css import OnlyOfType\n", "from soupsavvy import to_soupsavvy\n", "\n", "soup = BeautifulSoup(\n", " \"\"\"\n", "Only p
\n", "Text
\n", "Text
\n", " \n", " \n", " \"\"\",\n", " features=\"html.parser\",\n", ")\n", "element = to_soupsavvy(soup)\n", "\n", "selector = TypeSelector(\"div\") > (~Empty())\n", "selector.find_all(element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For finding all elements that have one child and are last child of their parent following selector can be used:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from bs4 import BeautifulSoup\n", "\n", "from soupsavvy import Anchor, HasSelector, to_soupsavvy\n", "from soupsavvy.selectors.css import LastChild, OnlyChild\n", "\n", "soup = BeautifulSoup(\n", " \"\"\"\n", "Text
\n", "