Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix utf8 handling in C code #3

Merged
merged 2 commits into from
Feb 23, 2017
Merged

Conversation

gfx
Copy link

@gfx gfx commented Jan 30, 2017

Fix #2, which comes in redcarpet.

Some tests are borrowed from vmg#464

@@ -22,6 +22,7 @@ Gem::Specification.new do |s|
s.require_paths = ["lib"]

s.add_development_dependency "nokogiri", "~> 1.6.0"
s.add_development_dependency "rake", "~> 10.0"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: I can't build it with the latest rake, so fixed the version to v10.

@gfx
Copy link
Author

gfx commented Jan 30, 2017

CI fails because of activesupport dependency. How can I fix it?

@gfx
Copy link
Author

gfx commented Feb 14, 2017

@yujinakayama ping?

@yujinakayama
Copy link

Sorry for the late response. Right now I'm busy but will review in this week 🙏

@yujinakayama yujinakayama mentioned this pull request Feb 20, 2017
@yujinakayama
Copy link

CI fails because of activesupport dependency. How can I fix it?

The CI issues are fixed in the current master. Could you please rebase against the master? 🙏

@@ -223,7 +223,7 @@ sd_autolink__email(
return 0;

for (link_end = 0; link_end < size; ++link_end) {
uint8_t c = data[link_end];
char c = data[link_end];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this change forces the c to be virtually 7-bit ASCII but it's a bit obscure. Maybe the way in vmg#463 and defining a macro something like is_ascii() (c < 0x7f) would be more explicit.

Copy link
Author

@gfx gfx Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this change forces the c to be virtually 7-bit ASCII

No, it might be a part of UTF-8's middle bytes, and the change of the type doesn't change the bit pattern of c.

Because c is dealt as char, it should be char, not uint8_t (i.e. unsigned char). I believe this code is the best.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right 👍

@gfx
Copy link
Author

gfx commented Feb 21, 2017

Rebased.

@yujinakayama yujinakayama merged commit 6451128 into increments:master Feb 23, 2017
@yujinakayama
Copy link

Thanks!

@yujinakayama
Copy link

Released as v3.2.2.2.

@gfx
Copy link
Author

gfx commented Feb 24, 2017

Thanks!

@gfx gfx deleted the fix_utf8_handling branch February 24, 2017 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants