Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing csharp files does not recognize the language #297

Open
dpordomingo opened this issue Mar 6, 2019 · 21 comments
Open

Parsing csharp files does not recognize the language #297

dpordomingo opened this issue Mar 6, 2019 · 21 comments
Labels
bug Something isn't working empathy-sessions Issue filed as as part of empathy sessions

Comments

@dpordomingo
Copy link
Contributor

dpordomingo commented Mar 6, 2019

discovered at https://github.com/src-d/empathy-sessions/issues/37#issuecomment-470045137

Running parse over a .cs file is not working

$ srcd parse uast examples/example.cs
INFO[0000] detected language: c#                        
Error: language c# is not supported
Usage:
  srcd parse uast [file-path] [flags]
...

Workaround

Until this is fixed, csharp needs to be passed explicitly to bblfshd. For example:

srcd parse uast csharp.cs --lang csharp
@dpordomingo dpordomingo added bug Something isn't working empathy-sessions Issue filed as as part of empathy sessions labels Mar 6, 2019
@smacker smacker self-assigned this Mar 12, 2019
@smacker
Copy link
Contributor

smacker commented Mar 12, 2019

This is a known problem in bblfsh:

We use supported languages API to provide meaningful error to the user.

So the question is: do we want to implement workaround inside engine specially for C#?

//cc @creachadair maybe you have some ETA for the proper fix on bblfsh side?

@creachadair
Copy link
Contributor

My understanding was that the fix from bblfsh/bblfshd#251 was supposed to address this without requiring a change to Engine. To really fix this we will need a solution for aliasing, for which we don't have a timeline yet (@bzz has started an RFC for the question, but we have had higher-priority work so far).

Is the above workaround not working for you?

@smacker
Copy link
Contributor

smacker commented Mar 12, 2019

it does not. We use enry + bblfsh supported languages api to check that bblfsh is able to parse a file. This api currently doesn't return c#, only csharp.

@creachadair
Copy link
Contributor

it does not. We use enry + bblfsh supported languages api to check that bblfsh is able to parse a file. This api currently doesn't return c#, only csharp.

I see. So is the issue not that you're unable to get a parse for the file, but that the list of languages doesn't match the string on the Engine side so that Engine doesn't even ask?

@smacker
Copy link
Contributor

smacker commented Mar 12, 2019

@creachadair yes

jfyi workaround from above also doesn't solve the issue for srcd sql and for gemini (which isn't part of engine yet).

@carlosms
Copy link
Contributor

I added this as a known issue to the release notes.

@smacker
Copy link
Contributor

smacker commented Mar 13, 2019

I would propose to return this issue to backlog and wait for the fix from LA team instead of implementing work around on our side due to:

  • we can work around only parse command, not gitbase
  • there is a chance we will remove parse command completely from engine

@carlosms wdyt?

@carlosms
Copy link
Contributor

Makes sense, let's wait for a proper fix, I don't think there is any rush.

@carlosms
Copy link
Contributor

It also fails for cpp (c++ is not supported) and bash (shell is not supported).

@smacker smacker removed their assignment Mar 13, 2019
@campoy
Copy link
Contributor

campoy commented Mar 13, 2019

It seems like C# is then not supported in any way other than language detection for the Engine.
I'd consider simply dropping mentions to C# on the release notes for 0.11.

@smacker
Copy link
Contributor

smacker commented Mar 13, 2019

just for the record:

  1. it works for such query:
SELECT UAST(f.blob_content, 'csharp') AS uast
FROM refs AS r
NATURAL JOIN commit_files
NATURAL JOIN files AS f
WHERE r.ref_name = 'HEAD' AND f.file_path REGEXP('.*.cs')
  1. it also works in bblfsh-web.

@creachadair
Copy link
Contributor

creachadair commented Mar 13, 2019

It seems like C# is then not supported in any way other than language detection for the Engine.
I'd consider simply dropping mentions to C# on the release notes for 0.11.

Dropping mentions is a safe choice. But note that the issue is in the language detection, not the functionality: Specifically the detection labels a C# file as "C#" which doesn't match Babelfish "csharp" and thus the file never makes it to the parsing step.

So this is a real bug, and needs a real fix, but the core functionality is solid. So I think we can demo the feature safely, modulo the need to work around that issue. Unfortunately, while the solution isn't difficult, it touches a bunch of plumbing, so it isn't as easy as patching one layer.

@carlosms
Copy link
Contributor

I've tested it a bit more and this works with 0.11.0:

srcd parse uast csharp.cs --lang csharp

So the real limitation is that you can't rely on automatic language detection, but you can still get UAST if you set the language manually (like @smacker said for the SQL queries). I will add this information to the known issues section of the release notes.

I've also tried a quick workaround, in case we think it's worth to release engine again to fix this issue on our side until it is handled by bblfshd.

If we remove this check, these 3 cases would work:

srcd parse uast csharp.cs
srcd parse uast csharp.cs --lang c#
srcd parse uast csharp.cs --lang csharp

@dpordomingo
Copy link
Contributor Author

why can't we check if one of the aliases of the required language is one of the exposed by bblfsh?

I mean:

Doing so, the filter will be still there, but working properly.

@carlosms
Copy link
Contributor

As a workaround, yes that makes sense. Now the question is if we want to do a workaround, or just wait for the proper handling of aliases by bblfshd.

@dpordomingo
Copy link
Contributor Author

Sorry, I missed something... I don't see why is it a problem on bblfsh...
I ran bblfsh container with images:

$ docker run --rm -d --name bblfshd --privileged -p 9999:9432 bblfsh/bblfshd:v2.11.8-drivers

And then I asked for the uast of a c# file in all possible ways:

$ bblfsh-cli example.cs 
$ bblfsh-cli -l csharp example.cs
$ bblfsh-cli -l c# example.cs 

all of them worked.

It also worked for c++

$ bblfsh-cli example.cpp 
$ bblfsh-cli -l c++ example.cpp 
$ bblfsh-cli -l cpp example.cpp 

🤔 not for bash

When I tried asking about a non existent driver language, it properly failed:

$ bblfsh-cli -l cobol example.cobol
couldn't parse example.cobol: rpc error: code = Unknown desc = unexpected error: runtime failure: missing driver for language "cobol"

@dpordomingo
Copy link
Contributor Author

So my question is:

Does the filter makes sense in source{d} Engine?

  • If we want to avoid sending files to bblfsh if we already know that there is no driver for it -> then yes: it makes sense, and we could enhance it as I explained.
  • If we don't care about sending files to bblfsh, we can drop it, and let bblfsh say that it could not parse it.

@bzz
Copy link
Contributor

bzz commented Mar 15, 2019

for the proper handling of aliases by bblfshd.

sorry for butting in, but just for the context - as noted in bblfsh/bblfshd#263 so far ETA for proper aliases in bblfsh is Q22019.

Meanwhile @dpordomingo suggestion https://github.com/src-d/engine/issues/297#issuecomment-473219312 on applying public enry API LanguageByAliasMap to the results of protocol.SupportedVersion before comparing it to enry.GetLanguage output inside the engine product sounds very safe and reasonable feature.

I would argue that this is different from what has been discussed in this thread before - it's not a workaround specific for C# and it should be working the same way and not require a change later, after introduction of proper aliases support on bblfsh side.

@carlosms
Copy link
Contributor

So my question is:

Does the filter makes sense in source{d} Engine?

  • If we want to avoid sending files to bblfsh if we already know that there is no driver for it -> then yes: it makes sense, and we could enhance it as I explained.
  • If we don't care about sending files to bblfsh, we can drop it, and let bblfsh say that it could not parse it.

We didn't have this check on the installed drivers, then we added it after the discussion on src-d/sourced-ce#162, creating src-d/sourced-ce#163 & src-d/sourced-ce#186.
So yes, the only reason it's there is to show a nicer error message in case the user requests to parse a file for which we don't have a driver. Another possibility discussed in #186 was to send the parse request and then check the error returned, but that required support from https://github.com/bblfsh/client-go/issues/106.

@creachadair
Copy link
Contributor

From the user's perspective, it'd be nice if we can avoid doing extra work for queries we know can't succeed. Once we have sorted out the mapping correctly, I think that's a reasonable tactic. But early filtering does force a coupling between the two APIs that—in retrospect—I suppose we didn't think through clearly enough.

In the meantime: I'd like to make sure the user gets an "understandable" result even if we have to do a little extra work (e.g., attempt to parse a file that a human might know won't work). The question for the UI, though, is how to tell the difference between "this failed because I couldn't find a driver for it" (which is what the pre-filter is a proxy for) and "this failed because the driver reported an error".

I should know but do not: Do we already use canonical error codes to distinguish cases like this, which could inform the UI how to give feedback to the user when a parse fails?

@marnovo
Copy link
Member

marnovo commented Apr 16, 2019

Workaround from v0.12 release notes:

Known Issues

#297: srcd parse does not detect the language automatically for C#, C++, or bash files. For these languages you will need to set --lang manually. For example:

$ srcd parse uast file.cs --lang csharp
$ srcd parse uast file.cpp --lang cpp
$ srcd parse uast file.bash --lang bash

@carlosms carlosms removed their assignment Apr 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working empathy-sessions Issue filed as as part of empathy sessions
Projects
None yet
Development

No branches or pull requests

7 participants