astral-tl-0.7.10/.cargo_vcs_info.json 0000644 00000000136 00000000001 0013024 0 ustar {
"git": {
"sha1": "0f85e8be8a6e913e27b67f7fa362d17969db74d3"
},
"path_in_vcs": ""
} astral-tl-0.7.10/.gitignore 0000644 0000000 0000000 00000000036 10461020230 0013603 0 ustar 0000000 0000000 /target
Cargo.lock
*.html
*.js astral-tl-0.7.10/CHANGELOG.md 0000644 0000000 0000000 00000012675 10461020230 0013440 0 ustar 0000000 0000000 Changes annotated with `⚠` are breaking.
# 0.7.8
- Fixes a build error if compiled with the `simd` feature flag. See [y21/tl#60]
- Fixes MDN-related doc comments ([y21/tl#51])
# 0.7.7
- Fixes a bug in the query selector parser that made it fail to parse values
containing `:`. See [y21/tl#46](https://github.com/y21/tl/issues/46) and
[y21/tl#47] for more details.
# 0.7.6
- Fixes a build error if compiled with the `simd` feature flag. See
[y21/tl#41](https://github.com/y21/tl/issues/41) for more details.
- ⚠ In prior versions, `innerHTML()` actually had the behavior of
`Element#outerHTML`. This was changed and `innerHTML` now correctly only
returns the markup of its subnodes, and not the markup of the own node.
- `outerHTML()` was added to nodes, which moves the old behavior to another
function.
- Added `children_mut()`, which allows mutating the subnodes of an HTML Tag.
# 0.7.5
- Fixed a bug that caused the parser to parse closing tags incorrectly. See
[y21/tl#37](https://github.com/y21/tl/issues/37) and
[y21/tl#38](https://github.com/y21/tl/pull/38) for more details.
# 0.7.4
- Restructure internals (mainly SIMD functions)
- Add fuzzing targets for internals
- Optimize stable parser (adds stable alternatives when the `simd` feature isn't
set)
# 0.7.3
- Fixed `HTMLTag::raw()` returning one byte less than it should have. See
[y21/tl#31](https://github.com/y21/tl/issues/31).
# 0.7.2
- Add `Attributes::contains(key)` to check if an attribute exists.
- Add `Attributes::remove(key)` to remove an attribute.
- Add `Attributes::remove_value(key)` to delete the value of a given attribute
key.
# 0.7.1
- Version bump in README.md
# 0.7.0
> **Warning: This release contains breaking changes**
- ⚠ Function signature of `Attributes::insert` has changed:
- It now takes two generic parameters `K, V` instead of just one. Prior to
this version, this meant that the key and value type had to match. See
[y21/tl#27](https://github.com/y21/tl/pull/26) for more details.
- Added a `TryFrom for Bytes` implementation for convenience to create
owned `Bytes`.
- Added `HTMLTag::boundaries` method for obtaining the start and end position of
a tag in the source string.
- Fixed a panic when source string abruptly ends with ``) is interpreted as `>` and
causes the next `>` to be interpreted as a text node on its own.
# 0.6.1
- Fixed an off-by-one error in the `QueryIterable` trait implementation for
`HTMLTag` that caused query selectors on HTML tags to return one node less
than they should.
# 0.6.0
> **Warning: This release contains breaking changes**
- ⚠ Removed deprecated method `VDom::find_node`
- Alternative: use `VDom::nodes().iter().find(...)` instead
- ⚠ `Attributes::get()` now returns a reference to `Bytes` instead of cloning.
- Prior to this version, it wasn't necessary to return a reference as the
`Bytes` type was just an immutable `&[u8]`. Now it can hold owned data.
- ⚠ `HTMLTag::children()` no longer returns an iterator, and instead returns a
wrapper struct around the children of the HTML tag. This wrapper struct makes
it easy to obtain direct children of the tag (`Children::top()`), or all
children (including their children, etc...) (`Children::all()`).
- ⚠ `Node::children()` no longer returns an iterator (see above).
- ⚠ `HTMLTag::name()` now returns a reference to `Bytes` instead of cloning
(see above).
- Ability to create/parse query selectors independent of any parser
(`tl::parse_query_selector`)
- Ability to reuse query selectors
- Ability to apply query selectors on `HTMLTag`s (see
[#18](https://github.com/y21/tl/issues/18))
- `queryselector` module is now public
- `InnerNodeHandle` is now u32
- Remove unused `max_depth` parser option
- Add convenience `PartialEq` and `PartialEq<[u8]>` impls for Bytes
# 0.5.0
> **Warning: This release contains breaking changes**
- Allow `Bytes` to store owned data through `Bytes::set()`
- ⚠ The maximum length for `Bytes` is `u32::MAX`
- ⚠ `tl::parse()` now returns `Result, ParseError>`
- ⚠ `Attributes` fields are no longer public, instead use one of the provided
methods
- ⚠ `HTMLTag::inner_html()` now takes a `&Parser` and no longer directly
returns the substring
- Node mutations to the tag or any of its subnodes means `inner_html` needs to
be recomputed
- Consider using `HTMLTag::raw()` if you never mutate any nodes
# 0.4.4
- Parse unquoted attribute values properly (``) [#12]
- Parse valueless attributes properly (`
"#;
let dom = parse(input, ParserOptions::default()).unwrap();
let parser = dom.parser();
let element = dom
.nodes()
.iter()
.find(|x| x.as_tag().is_some_and(|x| x.name().eq("a")));
assert_eq!(element.map(|x| x.inner_text(parser)), Some("nested".into()));
}
#[test]
fn fuzz() {
// Some tests that would previously panic or end in an infinite loop
// We don't need to assert anything here, just see that they finish
parse("J\x00<", ParserOptions::default()).unwrap();
parse("".repeat(count), ParserOptions::default()).unwrap();
}
#[test]
fn mutate_dom() {
let input = r#""#;
let mut dom = parse(input, ParserOptions::default()).unwrap();
let mut selector = dom.query_selector("[src]").unwrap();
let handle = selector.next().unwrap();
let parser = dom.parser_mut();
let el = handle.get_mut(parser).unwrap();
let tag = el.as_tag_mut().unwrap();
let attr = tag.attributes_mut();
let bytes = attr.get_mut("src").flatten().unwrap();
bytes.set("world.png").unwrap();
assert_eq!(attr.get("src"), Some(Some(&"world.png".into())));
}
mod simd {
// These tests make sure that SIMD functions do the right thing
#[test]
fn matches_case_insensitive_test() {
assert!(crate::simd::matches_case_insensitive(b"hTmL", *b"html"));
assert!(!crate::simd::matches_case_insensitive(b"hTmLs", *b"html"));
assert!(!crate::simd::matches_case_insensitive(b"hTmy", *b"html"));
assert!(!crate::simd::matches_case_insensitive(b"/Tmy", *b"html"));
}
#[test]
fn string_search() {
assert_eq!(crate::simd::find(b"a", b' '), None);
assert_eq!(crate::simd::find(b"", b' '), None);
assert_eq!(crate::simd::find(b"a ", b' '), Some(1));
assert_eq!(crate::simd::find(b"abcd ", b' '), Some(4));
assert_eq!(crate::simd::find(b"ab cd ", b' '), Some(2));
assert_eq!(crate::simd::find(b"abcdefgh ", b' '), Some(8));
assert_eq!(crate::simd::find(b"abcdefghi ", b' '), Some(9));
assert_eq!(crate::simd::find(b"abcdefghi", b' '), None);
assert_eq!(crate::simd::find(b"abcdefghiabcdefghi .", b' '), Some(18));
assert_eq!(crate::simd::find(b"abcdefghiabcdefghi.", b' '), None);
let count = if cfg!(miri) { 500usize } else { 1000usize };
let long = "a".repeat(count) + "b";
assert_eq!(crate::simd::find(long.as_bytes(), b'b'), Some(count));
}
#[test]
fn string_search_3() {
const NEEDLE: [u8; 3] = [b'a', b'b', b'c'];
assert_eq!(crate::simd::find3(b"e", NEEDLE), None);
assert_eq!(crate::simd::find3(b"a", NEEDLE), Some(0));
assert_eq!(crate::simd::find3(b"ea", NEEDLE), Some(1));
assert_eq!(crate::simd::find3(b"ef", NEEDLE), None);
assert_eq!(crate::simd::find3(b"ef a", NEEDLE), Some(3));
assert_eq!(crate::simd::find3(b"ef g", NEEDLE), None);
assert_eq!(crate::simd::find3(b"ef ghijk", NEEDLE), None);
assert_eq!(crate::simd::find3(b"ef ghijkl", NEEDLE), None);
assert_eq!(crate::simd::find3(b"ef ghijkla", NEEDLE), Some(9));
assert_eq!(crate::simd::find3(b"ef ghiajklm", NEEDLE), Some(6));
assert_eq!(crate::simd::find3(b"ef ghibjklm", NEEDLE), Some(6));
assert_eq!(crate::simd::find3(b"ef ghicjklm", NEEDLE), Some(6));
assert_eq!(crate::simd::find3(b"ef ghijklmnopqrstua", NEEDLE), Some(18));
assert_eq!(crate::simd::find3(b"ef ghijklmnopqrstub", NEEDLE), Some(18));
assert_eq!(crate::simd::find3(b"ef ghijklmnopqrstuc", NEEDLE), Some(18));
assert_eq!(crate::simd::find3(b"ef ghijklmnopqrstu", NEEDLE), None);
}
#[test]
#[rustfmt::skip]
fn search_non_ident() {
assert_eq!(crate::simd::search_non_ident(b"this-is-a-very-long-identifier<"), Some(30));
assert_eq!(crate::simd::search_non_ident(b"0123456789Abc_-<"), Some(15));
assert_eq!(crate::simd::search_non_ident(b"0123456789Abc-<"), Some(14));
assert_eq!(crate::simd::search_non_ident(b"0123456789Abcdef_-<"), Some(18));
assert_eq!(crate::simd::search_non_ident(b""), None);
assert_eq!(crate::simd::search_non_ident(b"short"), None);
assert_eq!(crate::simd::search_non_ident(b"short_<"), Some(6));
assert_eq!(crate::simd::search_non_ident(b"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_"), None);
assert_eq!(crate::simd::search_non_ident(b"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_<"), Some(64));
assert_eq!(crate::simd::search_non_ident(b"0123456789ab"), Some(0)); // 0x3E
assert_eq!(crate::simd::search_non_ident(b"@"), Some(0)); // 0x40 (just before 'A')
assert_eq!(crate::simd::search_non_ident(b"["), Some(0)); // 0x5B (just after 'Z')
assert_eq!(crate::simd::search_non_ident(b"`"), Some(0)); // 0x60 (just before 'a')
assert_eq!(crate::simd::search_non_ident(b"{"), Some(0)); // 0x7B (just after 'z')
assert_eq!(crate::simd::search_non_ident(b" "), Some(0)); // Space
assert_eq!(crate::simd::search_non_ident(b"="), Some(0)); // Equals
// Test valid identifier characters that might seem like they shouldn't be ('/', ':', and '+'
// are valid).
assert_eq!(crate::simd::search_non_ident(b"/"), None); // '/' IS an identifier
assert_eq!(crate::simd::search_non_ident(b":"), None); // ':' IS an identifier
assert_eq!(crate::simd::search_non_ident(b"+"), None); // '+' IS an identifier
// Test non-identifiers in the middle of valid identifiers.
assert_eq!(crate::simd::search_non_ident(b"abcdef"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"abc@def"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"abc[def"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"abc`def"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"abc{def"), Some(3));
// Test non-identifier at each position in first 16-byte chunk.
assert_eq!(crate::simd::search_non_ident(b"<234567890123456"), Some(0));
assert_eq!(crate::simd::search_non_ident(b"0<34567890123456"), Some(1));
assert_eq!(crate::simd::search_non_ident(b"01<4567890123456"), Some(2));
assert_eq!(crate::simd::search_non_ident(b"012<567890123456"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"0123<67890123456"), Some(4));
assert_eq!(crate::simd::search_non_ident(b"01234<7890123456"), Some(5));
assert_eq!(crate::simd::search_non_ident(b"012345<890123456"), Some(6));
assert_eq!(crate::simd::search_non_ident(b"0123456<90123456"), Some(7));
assert_eq!(crate::simd::search_non_ident(b"01234567<0123456"), Some(8));
assert_eq!(crate::simd::search_non_ident(b"012345678<123456"), Some(9));
assert_eq!(crate::simd::search_non_ident(b"0123456789<23456"), Some(10));
assert_eq!(crate::simd::search_non_ident(b"0123456789a<3456"), Some(11));
assert_eq!(crate::simd::search_non_ident(b"0123456789ab<456"), Some(12));
assert_eq!(crate::simd::search_non_ident(b"0123456789abc<56"), Some(13));
assert_eq!(crate::simd::search_non_ident(b"0123456789abcd<6"), Some(14));
assert_eq!(crate::simd::search_non_ident(b"0123456789abcde<"), Some(15));
// Test special HTML/XML characters that are common non-identifiers.
assert_eq!(crate::simd::search_non_ident(b"tag<"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"tag>"), Some(3));
assert_eq!(crate::simd::search_non_ident(b"tag "), Some(3)); // Space
assert_eq!(crate::simd::search_non_ident(b"tag="), Some(3)); // Equals
assert_eq!(crate::simd::search_non_ident(b"tag\""), Some(3)); // Quote
assert_eq!(crate::simd::search_non_ident(b"tag'"), Some(3)); // Single quote
assert_eq!(crate::simd::search_non_ident(b"tag/"), None);
assert_eq!(crate::simd::search_non_ident(b"tag:"), None);
assert_eq!(crate::simd::search_non_ident(b"tag+"), None);
// Test long strings with non-identifier at various positions.
let long_ident = b"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_";
assert_eq!(crate::simd::search_non_ident(long_ident), None);
// 64 bytes, all identifiers.
let mut buf = [b'a'; 64];
assert_eq!(crate::simd::search_non_ident(&buf), None);
// Non-identifier at position 63.
buf[63] = b'<';
assert_eq!(crate::simd::search_non_ident(&buf), Some(63));
// Non-identifier at position 32 (start of 3rd chunk).
buf[63] = b'a';
buf[32] = b'<';
assert_eq!(crate::simd::search_non_ident(&buf), Some(32));
}
}
mod bytes {
use crate::bytes::*;
#[test]
fn from_str() {
let x = Bytes::from("hello");
assert_eq!(x.as_bytes(), b"hello");
}
#[test]
fn from_bytes() {
let x = Bytes::from(b"hello" as &[u8]);
assert_eq!(x.as_bytes(), b"hello");
}
#[test]
fn as_bytes_borrowed() {
let xb = Bytes::from(b"hello" as &[u8]);
assert_eq!(xb.as_bytes_borrowed(), Some(b"hello" as &[u8]));
let mut xc = xb.clone();
xc.set("test2").unwrap();
assert_eq!(xc.as_bytes_borrowed(), None);
}
#[test]
fn as_utf8_str() {
assert_eq!(Bytes::from("hello").as_utf8_str(), "hello");
}
#[test]
fn clone_shallow() {
// cloning a borrowed slice does not deep-clone
let x = Bytes::from("hello");
let xp = x.as_ptr();
let y = x.clone();
let yp = y.as_ptr();
assert_eq!(xp, yp);
}
#[test]
fn drop_old_owned() {
let mut x = Bytes::from("");
x.set("test").unwrap();
x.set("test2").unwrap();
}
#[test]
fn clone_owned_deep() {
let mut x = Bytes::from("");
x.set("hello").unwrap();
let xp = x.as_ptr();
let y = x.clone();
let yp = y.as_ptr();
assert_eq!(x, y);
assert_ne!(xp, yp);
}
#[test]
fn empty() {
let _x = Bytes::new();
}
#[test]
fn empty_set() {
let mut x = Bytes::new();
x.set("hello").unwrap();
}
#[test]
fn set() {
let mut x = Bytes::from("hello");
let xp = x.as_ptr();
x.set("world").unwrap();
let xp2 = x.as_ptr();
// check that the changes are reflected
assert_eq!(x.as_bytes(), b"world");
// pointer must be different now as the call to `set` should cause an allocation
assert_ne!(xp, xp2);
}
#[test]
fn clone_deep() {
let x = Bytes::from("hello");
let xp = x.as_ptr();
let mut y = x.clone();
y.set("world").unwrap();
let yp = y.as_ptr();
assert_ne!(xp, yp);
}
#[test]
fn into_owned_bytes() {
let mut x1 = Bytes::new();
x1.set("hello").unwrap(); // &str
let mut x2 = x1.clone();
x2.set(b"world" as &[u8]).unwrap(); // &[u8]
let mut x3 = x1.clone();
x3.set(vec![0u8, 1, 2, 3, 4]).unwrap(); // Vec
let mut x4 = x1.clone();
x4.set(vec![0u8, 1, 2, 3, 4].into_boxed_slice()).unwrap(); // Box<[u8]>
let mut x5 = x1.clone();
x5.set(String::from("Tests are important")).unwrap(); // String
}
}
#[test]
fn valueless_attribute() {
// https://github.com/y21/tl/issues/11
let input = r#"
"#;
let dom = parse(input, ParserOptions::default()).unwrap();
let element = dom.get_element_by_id("u54423");
assert!(element.is_some());
}
#[test]
fn valueless_attribute_next_attribute() {
// https://github.com/y21/tl/issues/70
let input = r#""#;
let dom = parse(input, ParserOptions::default()).unwrap();
let element = dom.get_element_by_id("btn");
assert!(element.is_some());
}
#[test]
fn unquoted() {
// https://github.com/y21/tl/issues/12
let input = r#"
Hello World
"#;
let dom = parse(input, ParserOptions::default()).unwrap();
let parser = dom.parser();
let element = dom.get_element_by_id("u54423");
assert_eq!(
element.and_then(|x| x.get(parser).map(|x| x.inner_text(parser))),
Some("Hello World".into())
);
}
#[test]
fn unquoted_href() {
// https://github.com/y21/tl/issues/12
let input = r#"
Hello World
"#;
let dom = parse(input, ParserOptions::default()).unwrap();
let parser = dom.parser();
let element = dom.get_element_by_id("u54423");
assert_eq!(
element.and_then(|x| x.get(parser).map(|x| x
.as_tag()
.unwrap()
.attributes()
.get("href")
.flatten()
.unwrap()
.try_as_utf8_str()
.unwrap()
.to_string())),
Some("https://www.google.com".into())
);
}
#[test]
fn unquoted_self_closing() {
// https://github.com/y21/tl/issues/12
let input = r#"
"#;
let dom = parse(input, ParserOptions::default()).unwrap();
let element = dom.get_element_by_id("u54423");
assert!(element.is_some());
// According to MDN, if there's no space between an unquoted attribute and the closing tag,
// the slash is treated as part of the attribute value.
let input = r#"
"#;
let dom = parse(input, ParserOptions::default()).unwrap();
let element = dom.get_element_by_id("u54423/");
assert!(element.is_some());
}
mod query_selector {
use super::*;
#[test]
fn query_selector_simple() {
let input = "