Single post
jump to repliessmolweb-validator is a new CLI tool to check if a web page respects smolweb HTML subset.
This tool uses jsoup HTML5 parser, a java library.
Exemple :
$ ./smolweb-validate https://adele.pages.casa/md/
π Validating: https://adele.pages.casa/md/
ββββββββββββββ
π VALIDATION SUMMARY
ββββββββββββββ
β
VALID: This page conforms to the smolweb HTML subset!
π Statistics:
Total elements: 71
Valid elements: 71
Invalid elements: 0
ββββββββββββββ
Validation complete.
7 replies
back to top@adele thanks, this is great! :)
Do you think it would be useful to specify a custom DTD for validation? https://en.wikipedia.org/wiki/Document_type_definition
Not sure if the W3C validator can be used with a custom DTD pointing to a smolweb URL. Here is the one for XHTML 1.0 Strict as a example: https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
I checked a simple case with a XML file using an embedded DTD with xmllint (in libxml2 package) and it detects errors as expected.
@dillo the problem with DTD is that the document has to be a well-formed XML doc. This constraint is too heavy for handwritten html.
@adele you are referring to leaving some tags open? If so, I believe that would require having a complicated parsing/recovery strategy:
https://html.spec.whatwg.org/multipage/parsing.html#misnested-tags:-b-i-/b-/i
https://html.spec.whatwg.org/multipage/parsing.html#adoptionAgency
https://html.spec.whatwg.org/multipage/parsing.html#reconstruct-the-active-formatting-elements
@dillo I think that XHTML was wrong way during HTML history. Before and after XHTML period, Β«IMGΒ», Β«HRΒ», Β«BRΒ»,Β«INPUTΒ»β¦ without a closing / are more simple to write, and more logical (these tags never enclose any thing, so why would it be necessary to close them). All browsers are already able to understand them.
@adele I think the html tags got filtered in your toot. I mean not closing tags such as (replaced "<>" with "()") this case:
(ul)
(li)Item one
(li)Item two
(/ul)
One of the complexities of the parser in Dillo (and other HTML 5 parsers) is that it needs to determine which tags close which other previous tags. Sometimes this is the result of a typo, not intentional.
I would like to get rid of such complexities at parsing, but maybe that would be outside the scope of the smolweb goal.
@dillo sorry, I have corrected my previous toot and used Β« Β» chars.
OK, I see your problem. This kind of syntax should not be used. I will add this in the smolweb rules and the validator. Same thing for the other problem you mentioned (such as «B»«I»text«/B»«/I» )
@dillo now smolweb-validator checks for misnested and unclosed tags