Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String with URL and Email ignores finder.url_must_have_scheme ? #44

Closed
thomasgloe opened this issue Jul 7, 2022 · 8 comments
Closed

Comments

@thomasgloe
Copy link

thomasgloe commented Jul 7, 2022

Hi,

I tested the example from the docs and it works great:

use linkify::LinkFinder;

let input = "Look, no scheme: example.org/foo";
let mut finder = LinkFinder::new();

// true by default
finder.url_must_have_scheme(false);

let links: Vec<_> = finder.links(input).collect();
assert_eq!(links[0].as_str(), "example.org/foo");

However, when the input string is changed to
let input = "Look, no scheme: example.org/foo email@foo.com";
The url example.org/foo is not detected anymore. The same applies to the demo website https://robinst.github.io/linkify/ with the input string.

Is this the expected outcome or a bug? Is there any additional switch to detect the url even if there is an email in a string?

Related to #8

@thomasgloe
Copy link
Author

As a workaround it is possible to split a string in spans:

use linkify::LinkFinder;

let input = "Look, no scheme: example.org/foo Email: email@foo.com";
let mut finder = LinkFinder::new();

finder.url_must_have_scheme(false);

let spans: Vec<_> = finder.spans(input).collect();
for span in spans {
  let links: Vec<_> = finder.links(span.as_str()).collect();
  for link in links {
    println!(" - link: {}", link.as_str());
  }
}

@mre
Copy link
Contributor

mre commented Jul 7, 2022

Not sure if it helps but can you also check against #43?

@thomasgloe
Copy link
Author

thomasgloe commented Jul 7, 2022

Mh, I've switched to a simple RegEx approach as I've observed additional issues in my test case. Urls in my data are not too complicated. So even the workaround above did not fix all issues.

@robinst
Copy link
Owner

robinst commented Jul 7, 2022

Yep, #43 fixes that problem too, I'll add that as a test case.

@thomasgloe can you provide the additional problematic cases that you've found here?

@thomasgloe
Copy link
Author

Code example:

use linkify::{Link, LinkFinder};

fn find_links(input: &str) -> Vec<Link> {
    let mut finder = LinkFinder::new();
    finder.url_must_have_scheme(false);

    let mut links = Vec::new();
    let spans: Vec<_> = finder.spans(input).collect();
    for span in spans {
        // added second finder, to test if this makes any difference - it does not.
        let mut finder2 = LinkFinder::new();
        finder2.url_must_have_scheme(false);
        let mut tlinks: Vec<_> = finder2.links(span.as_str()).collect();
        links.append(&mut tlinks);
    }

    links
}

fn main() {
    // multiline input string
    let input = "Web:
www.foobar.co
E-Mail:
      bar@foobar.co (bla bla bla)";

    let links = find_links(input);
    for link in links {
        println!(" - link: {}", link.as_str());
    }
}

results in:

 - link: Web:
www.foobar.co
 - link: bar@foobar.co

But I would expect:

 - link: www.foobar.co
 - link: bar@foobar.co

@thomasgloe
Copy link
Author

Indeed, I've checked with
linkify = { git = "https://github.com/robinst/linkify", branch = "check-domains" }
and the problematic case above seems to work.

@robinst
Copy link
Owner

robinst commented Jul 11, 2022

Good to hear! To be honest, the implementation of the url_must_have_scheme(false) mode had a few problems before. With the branch, its logic is now unified with the others and much cleaner.

I'm releasing the change soon.

@robinst
Copy link
Owner

robinst commented Jul 11, 2022

@robinst robinst closed this as completed Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants