Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse lang command returns blank results #277

Open
mcarmonaa opened this issue Mar 5, 2019 · 5 comments
Open

parse lang command returns blank results #277

mcarmonaa opened this issue Mar 5, 2019 · 5 comments
Labels
empathy-sessions Issue filed as as part of empathy sessions

Comments

@mcarmonaa
Copy link

Related to the empathy session.

Running this query:

/* Top languages by repository count */
SELECT *
FROM (SELECT language, COUNT(repository_id) AS repository_count
      FROM   (SELECT DISTINCT
                r.repository_id,
                LANGUAGE(t.tree_entry_name, b.blob_content) AS language
              FROM   refs r
                      JOIN commits c ON r.commit_hash = c.commit_hash
                      JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
                      JOIN tree_entries t ON ct.tree_hash = t.tree_hash
                      JOIN blobs b ON t.blob_hash = b.blob_hash
              WHERE  r.ref_name = 'HEAD') AS q1
      GROUP  BY language) AS q2
ORDER  BY repository_count DESC

I noticed the result returns a blank language in the second position

+-------------------+------------------+
|     LANGUAGE      | REPOSITORY COUNT |
+-------------------+------------------+
| Ignore List       |                6 |
|                   |                6 |
| Text              |                6 |
| Markdown          |                6 |
| JSON              |                6 |
| YAML              |                5 |
| Dockerfile        |                5 |
| INI               |                5 |
| Shell             |                5 |
| HTML              |                5 |
| Java              |                5 |
| Makefile          |                4 |
| JavaScript        |                4 |
| Python            |                4 |
| C                 |                4 |
| XML               |                4 |
| TOML              |                3 |
| Go                |                3 |
| Protocol Buffer   |                3 |
| SVG               |                3 |
| Groovy            |                3 |
| Unix Assembly     |                3 |
| Gradle            |                3 |
| Batchfile         |                3 |
| Java Properties   |                3 |
| Ruby              |                3 |
| CSS               |                3 |
| SQL               |                3 |
| Smarty            |                3 |
| Vim script        |                2 |
| CSV               |                2 |
| Git Config        |                2 |
| reStructuredText  |                2 |
| Git Attributes    |                2 |
| Perl              |                2 |
| Maven POM         |                2 |
| AsciiDoc          |                2 |
| XSLT              |                2 |
| PLSQL             |                2 |
| FreeMarker        |                2 |
| Java Server Pages |                2 |
| Kotlin            |                2 |
| PLpgSQL           |                2 |
| Less              |                1 |
| HAProxy           |                1 |
| PowerShell        |                1 |
| R                 |                1 |
| Ant Build System  |                1 |
| Scala             |                1 |
| Roff              |                1 |
| Yacc              |                1 |
| RMarkdown         |                1 |
| HTML+Django       |                1 |
| Thrift            |                1 |
| AspectJ           |                1 |
| Csound            |                1 |
| GAP               |                1 |
| SQLPL             |                1 |
| HTML+ERB          |                1 |
| HiveQL            |                1 |
| q                 |                1 |
| ANTLR             |                1 |
+-------------------+------------------+

These are the list of repositories I'm using:

  • github.com/srcd-/gitbase
  • github.com/srcd-/gitbase-web
  • github.com/bblfsh/bblfshd
  • github.com/apache/spark
  • github.com/spring-projects/spring-boot
  • github.com/spring-projecst/spring-framework

I found that using srcd parse lang on this file and this file return nothing.

Not sure if this is a bug or not.

@carlosms carlosms added the empathy-sessions Issue filed as as part of empathy sessions label Mar 5, 2019
@carlosms
Copy link
Contributor

carlosms commented Mar 5, 2019

I think it makes sense to return an empty string in LANGUAGE() when it cannot be detected. You can always add a WHERE language <> '' if you need to filter them. What do you think @ajnavarro?

@ajnavarro
Copy link

ajnavarro commented Mar 6, 2019

Yep, no lang detected is an empty string for enry, so we are returning that.

Edit:

We return NULL if no lang is detected by enry:

	lang := enry.GetLanguage(path, blob)
	if lang == "" {
		return nil, nil
	}

So that empty result might be a null

@dpordomingo
Copy link
Contributor

@carlosms is this an issue for Engine or for Gitbase?

@carlosms
Copy link
Contributor

carlosms commented Mar 6, 2019

I think it's not a bug. If anything, we could edit the query example in gitbase-web to filter out empty languages... What do you think @mcarmonaa?

@mcarmonaa
Copy link
Author

I thinks is a good idea adding a filter for empty languages, it'd play also as an example/documentation for this specific case which couldn't seem obvious at a first glance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
empathy-sessions Issue filed as as part of empathy sessions
Projects
None yet
Development

No branches or pull requests

4 participants