Adële 🐁! (@adele@social.pollux.casa)

Single post

jump to replies

smolweb-validator is a new CLI tool to check if a web page respects smolweb HTML subset.

This tool uses jsoup HTML5 parser, a java library.

Exemple :

$ ./smolweb-validate https://adele.pages.casa/md/

🔍 Validating: https://adele.pages.casa/md/
━━━━━━━━━━━━━━

📊 VALIDATION SUMMARY
━━━━━━━━━━━━━━
✅ VALID: This page conforms to the smolweb HTML subset!

📈 Statistics:
  Total elements: 71
  Valid elements: 71
  Invalid elements: 0

━━━━━━━━━━━━━━
Validation complete.

#smolweb #smallweb

Published: Aug 17, 2025, 15:48*
Visibility: Public
Replies: 4

Language: English
Favourites: 37
Reblogs: 14
Edit timeline:: Edited Aug 17, 2025, 15:48; Published Aug 17, 2025, 15:48

14 visible replies; 16 more replies hidden or not public

Dillo browser @dillo@fosstodon.org

@adele thanks, this is great! :)

Do you think it would be useful to specify a custom DTD for validation? https://en.wikipedia.org/wiki/Document_type_definition

Not sure if the W3C validator can be used with a custom DTD pointing to a smolweb URL. Here is the one for XHTML 1.0 Strict as a example: https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd

I checked a simple case with a XML file using an embedded DTD with xmllint (in libxml2 package) and it detects errors as expected.

% xmllint --valid test.xml
test.xml:9: element notbook: validity error : No declaration for element notbo
ok
Not a book
test.xml:10: element books: validity error : Element books content does not fo
low the DTD, expecting (book)+, got (book book notbook )

1>

The Lord of the Rings
Harry Potter
Not a book

%

Published: Aug 17, 2025, 22:49
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@dillo the problem with DTD is that the document has to be a well-formed XML doc. This constraint is too heavy for handwritten html.

Published: Aug 18, 2025, 05:40
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Dillo browser @dillo@fosstodon.org

@adele you are referring to leaving some tags open? If so, I believe that would require having a complicated parsing/recovery strategy:

https://html.spec.whatwg.org/multipage/parsing.html#misnested-tags:-b-i-/b-/i

https://html.spec.whatwg.org/multipage/parsing.html#adoptionAgency

https://html.spec.whatwg.org/multipage/parsing.html#reconstruct-the-active-formatting-elements

Published: Aug 18, 2025, 21:27
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@dillo I think that XHTML was wrong way during HTML history. Before and after XHTML period, «IMG», «HR», «BR»,«INPUT»… without a closing / are more simple to write, and more logical (these tags never enclose any thing, so why would it be necessary to close them). All browsers are already able to understand them.

Published: Aug 18, 2025, 21:49*
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0
Edit timeline:: Edited Aug 19, 2025, 03:55; Published Aug 18, 2025, 21:49

Dillo browser @dillo@fosstodon.org

@adele I think the html tags got filtered in your toot. I mean not closing tags such as (replaced "<>" with "()") this case:

(ul)
(li)Item one
(li)Item two
(/ul)

One of the complexities of the parser in Dillo (and other HTML 5 parsers) is that it needs to determine which tags close which other previous tags. Sometimes this is the result of a typo, not intentional.

I would like to get rid of such complexities at parsing, but maybe that would be outside the scope of the smolweb goal.

Published: Aug 18, 2025, 22:03
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@dillo sorry, I have corrected my previous toot and used « » chars.

OK, I see your problem. This kind of syntax should not be used. I will add this in the smolweb rules and the validator. Same thing for the other problem you mentioned (such as «B»«I»text«/B»«/I» )

Published: Aug 19, 2025, 04:04
Visibility: Public
Replies: 2

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@dillo now smolweb-validator checks for misnested and unclosed tags

🔍 Validating: https://smolweb.org/badpage.html
---

📊 VALIDATION SUMMARY
---
❌ INVALID: This page does not conform to the smolweb HTML subset.

📈 Statistics:
Total elements: 36
Valid elements: 36
Invalid elements: 0

🔀 Misnested Tags (7):
⚠️ HTML tags are improperly nested (e.g., text)
1. Misnested tags detected: at line 23 is still open when closing at line 24. Tags must be properly nested.
2. Tag at line 23 was not properly closed before
3. Tag at line 22 was not properly closed before
4. Tag at line 21 was not properly closed before
5. Misnested tags detected: at line 29 is still open when closing at line 29. Tags must be properly nested.
6. Tag at line 29 was not properly closed before
7. Closing tag at line 29 has no matching opening tag

✅ Valid tags used:
• : 5 occurrence(s)
• : 5 occurrence(s)
• : 4 occurrence(s)
• : 4 occurrence(s)
• : 2 occurrence(s)
• : 2 occurrence(s)
• : 1 occurrence(s)
• : 1 occurrence(s)
• : 1 occurrence(s)
• : 1 occurrence(s)
• : 1 occurrence(s)
• : 1 occurrence(s)
• : 1 occurrence(s)
•

Published: Aug 19, 2025, 05:57
Visibility: Public

Language: English
Replies: 0
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@dillo v1.2.1 check more parsing problems

Published: Aug 19, 2025, 14:16
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Dillo browser @dillo@fosstodon.org

@adele sorry for the delay, thanks for the update!

Published: Aug 21, 2025, 18:54
Visibility: Public

Language: English
Replies: 0
Favourites: 1
Reblogs: 0

Evil Love 『Vulonkaaz』 @vulonkaaz@v2.flyingcube.tech

@adele there's no way your standard got <b> and <i> but not <s>

Published: Aug 18, 2025, 09:53
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@vulonkaaz

s tag is not part of the W3C XHTML Basic, unlike b and i.

As explained in the smolweb HTML subset guide, this subset is inspired by a previous work proposed by W3C: the XHTML Basic.

The XHTML Basic document type includes the minimal set of modules required to be an XHTML host language document type, and in addition it includes images, forms, basic tables, and object support. It is designed for Web clients that do not support the full set of XHTML features; for example, Web clients such as mobile phones, PDAs, pagers, and set top boxes. The document type is rich enough for content authoring.

For a better compatibility, it is not a good idea to specify the XHTML Basic 1.1 DTD in the doctype for smolwebsites.

Some deprecated tags (accronym, big, tt) have been removed from this list. Object and param tags have been banned to avoid inclusion of specific code such as Java applets.

As specify in Guidelines, semantics tags issued in more recent HTML versions have been added to propose a better accessibility.

https://smolweb.org/specs/index.html

https://www.w3.org/TR/xhtml-basic/#s_xhtmlmodules

Published: Aug 18, 2025, 12:40
Visibility: Public
Replies: 1

Language: English
Favourites: 0
Reblogs: 0

Adële 🐁! @adele

@vulonkaaz However, adding s tag could be a good idea.

I will search if it is well supported by old and tiny/basic browsers. If ignored, it would be dangerous to not see that a text is stroke.

Published: Aug 18, 2025, 12:52
Visibility: Public
Replies: 1

Language: English
Favourites: 1
Reblogs: 0

Evil Love 『Vulonkaaz』 @vulonkaaz@v2.flyingcube.tech

@adele tried with dillo, emacs and lynx all handle my <s> just fine

think it would make sense to add it

Published: Aug 18, 2025, 12:57
Visibility: Public
Replies: 1

Language: English
Favourites: 1
Reblogs: 0

Evil Love 『Vulonkaaz』 @vulonkaaz@v2.flyingcube.tech

@adele elinks doesn't show any sign that text is supposed to be striked tho

Published: Aug 18, 2025, 12:59
Visibility: Public

Language: English
Replies: 0
Favourites: 1
Reblogs: 0