module Nokogiri
Nokogiri
parses and searches XML/HTML very quickly, and also has correctly implemented CSS3 selector support as well as XPath 1.0 support.
Parsing a document returns either a Nokogiri::XML::Document
, or a Nokogiri::HTML4::Document
depending on the kind of document you parse.
Here is an example:
require 'nokogiri' require 'open-uri' # Get a Nokogiri::HTML4::Document for the page we’re interested in... doc = Nokogiri::HTML4(URI.open('http://www.google.com/search?q=tenderlove')) # Do funky things with it using Nokogiri::XML::Node methods... #### # Search for nodes by css doc.css('h3.r a.l').each do |link| puts link.content end
See Nokogiri::XML::Searchable#css
for more information about CSS
searching. See Nokogiri::XML::Searchable#xpath
for more information about XPath searching.
Constants
- HTML
- VERSION
The version of
Nokogiri
you are using- VERSION_INFO
More complete version information about libxml
Public Class Methods
Parse HTML
. Convenience method for Nokogiri::HTML4::Document.parse
# File lib/nokogiri/html4.rb, line 6 def HTML4(input, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_HTML, &block) Nokogiri::HTML4::Document.parse(input, url, encoding, options, &block) end
@since v1.12.0 @note HTML5
functionality is not available when running JRuby. Parse an HTML5
document. Convenience method for {Nokogiri::HTML5::Document.parse}
# File lib/nokogiri/html5.rb, line 27 def self.HTML5(input, url = nil, encoding = nil, **options, &block) Nokogiri::HTML5::Document.parse(input, url, encoding, **options, &block) end
Parse a document and add the Slop
decorator. The Slop
decorator implements method_missing such that methods may be used instead of CSS
or XPath. For example:
doc = Nokogiri::Slop(<<-eohtml) <html> <body> <p>first</p> <p>second</p> </body> </html> eohtml assert_equal('second', doc.html.body.p[1].text)
# File lib/nokogiri.rb, line 83 def Slop(*args, &block) Nokogiri(*args, &block).slop! end
Parse XML
. Convenience method for Nokogiri::XML::Document.parse
# File lib/nokogiri/xml.rb, line 6 def XML(thing, url = nil, encoding = nil, options = XML::ParseOptions::DEFAULT_XML, &block) Nokogiri::XML::Document.parse(thing, url, encoding, options, &block) end
Create a Nokogiri::XSLT::Stylesheet
with stylesheet
.
Example:
xslt = Nokogiri::XSLT(File.read(ARGV[0]))
# File lib/nokogiri/xslt.rb, line 11 def XSLT stylesheet, modules = {} XSLT.parse(stylesheet, modules) end
# File lib/nokogiri.rb, line 87 def install_default_aliases # Make sure to support some popular encoding aliases not known by # all iconv implementations. { "Windows-31J" => "CP932", # Windows-31J is the IANA registered name of CP932. }.each do |alias_name, name| EncodingHandler.alias(name, alias_name) if EncodingHandler[alias_name].nil? end end
Create a new Nokogiri::XML::DocumentFragment
# File lib/nokogiri.rb, line 60 def make(input = nil, opts = {}, &blk) if input Nokogiri::HTML4.fragment(input).children.first else Nokogiri(&blk) end end
Parse an HTML
or XML
document. string
contains the document.
# File lib/nokogiri.rb, line 42 def parse(string, url = nil, encoding = nil, options = nil) if string.respond_to?(:read) || /^\s*<(?:!DOCTYPE\s+)?html[\s>]/i === string[0, 512] # Expect an HTML indicator to appear within the first 512 # characters of a document. (<?xml ?> + <?xml-stylesheet ?> # shouldn't be that long) Nokogiri.HTML4(string, url, encoding, options || XML::ParseOptions::DEFAULT_HTML) else Nokogiri.XML(string, url, encoding, options || XML::ParseOptions::DEFAULT_XML) end.tap do |doc| yield doc if block_given? end end
# File lib/nokogiri/version/info.rb, line 200 def self.uses_gumbo? uses_libxml? # TODO: replace with Gumbo functionality end