Skip to content

Extracting embedded JSON objects with rogue elements #2805

Answered by flavorjones
forthrin asked this question in Q&A
Discussion options

You must be logged in to vote

Hi, @forthrin, thanks for asking this question!

Nokogiri specifically wraps XML and HTML parsers, and so unfortunately has no capability to parse Javascript.

That said, you may want to look at something like https://github.com/nene/rkelly-remix which is a pure-Ruby javascript parser! You should be able to use it to examine the contents of a <script> tag:

#! /usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "rkelly-remix"
  gem "nokogiri"
end

require "rkelly"

markup = '<script>foo={"a": 1, "b": undefined, "c": function(){}};</script><script>bar={"baz": 2};</script>'
html_doc = Nokogiri.HTML5(markup)

script = html_doc.at_css("script")
script.co…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Answer selected by flavorjones
Comment options

You must be logged in to vote
1 reply
@stevecheckoway
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
3 participants
Converted from issue

This discussion was converted from issue #2804 on March 01, 2023 13:32.