Calculate the grade level of a text passage

Easily and accurately calculate a text's readability.

Installation:

$ pip install new-dale-chall-readability

Let's try it out:

$ ipython

In [1]: from new_dale_chall_readability import cloze_score, reading_level

In [2]: text = (
   ...:     'Latin for "friend of the court." It is advice formally offered '
   ...:     'to the court in a brief filed by an entity interested in, but not '
   ...:     'a party to, the case.'
   ...:     )

In [3]: reading_level(text)
Out[3]: '7-8'

So it's grade 7–8 reading level.

In [4]: cloze_score(text)
Out[4]: 36.91

And yep, the 36.91 cloze score says it's moderately difficult.

So how is this useful? Well, here's one way:

My legal dictionary orders entries like amicus curiae from simplest to most complex. I think it helps with comprehension and learning. I coded the numeric cloze score as the sort key.

What's "reading level" and "cloze score"?

Reading level is the grade level of the material, in years of education. The scale is from 1 to 16+.

Cloze is a deletion test invented by Taylor (1953). The 36.91 score, above, means that roughly that 37% of the words could be deleted and the passage could still be understood. So, a higher cloze score is more readable. They "range from 58 and above for the easiest passages to 10-15 and below for the most difficult" (Chall & Dale, p. 75).

See the integration test file for text samples from the book, along with their scores.

Why yet another readability library?

Before creating this, I tried really hard to find a readability library that gave correct results, and also seemed to be a good algorithm. I realized I really like Dale-Chall. But I found show-stopping bugs in the existing libraries that cause them to give wrong answers.

There are a ton of low-effort blog posts about Dale-Chall: they all seem to have different ideas about how it works. So I wrote this by first ordering a copy of Readability Revisited: The new Dale-Chall readability formula. Then I used the book to code the library from scratch. My goal was to create the best library I could for analyzing text. It needs to start with giving correct results. I did my best to rigorously design and test the code. And secondly, it needs to be modern Python code that's super easy to use. So, no objects to instantiate and no odd module naming. Just a couple of functions which can be called.

It's 2022 and there are probably a half-dozen implementations on PyPI. So why create another one?

The existing libraries have issues that made me wonder if the results were accurate. For example:
- From my reading, I saw that reading levels are a set of ten "increasingly broad bands" (p. 75). And they have labels like 3 and 7-8. The existing readability libraries treat these as floating point numbers. But now I believe that an enumeration — or specifically, a Literal — captures the formula better: Literal["1", "2", "3", "4", "5-6", "7-8", "9-10", "11-12", "13-15", "16+"]
- I also couldn't find a good description of this "new" Dale-Chall formula, and how the existing libraries implement it.
- The readability scores are important for my international dictionary app: It shows definitions sorted with the most readable first, to increase comprehension. The entry for amicus curiae is a good example. But I was getting odd results on some pages.
Use Test-Driven Development to squash bugs and prevent regressions.
Turn examples from the book into test cases.
Write with modern Python. I'm no expert, so I'm learning as I go along. E.g.,
- It passes Pyright strict-mode type-checking.
- It uses recent type enhancements like Literal.
Present a very easy API to use in any app or library.
- No need to instantiate an object and learn its API.
- Just import the needed function and call it.

The result is a library that provides, I think, more accurate readability scores.

References

Chall, J., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.

Taylor, W. (1953). Cloze procedure: a new tool for measuring readability. Journalism Quarterly, 33, 42-46.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
new_dale_chall_readability		new_dale_chall_readability
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

new_dale_chall_readability

new_dale_chall_readability

tests

tests

.gitignore

.gitignore

.tool-versions

.tool-versions

LICENSE

LICENSE

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

Calculate the grade level of a text passage

Installation:

Let's try it out:

What's "reading level" and "cloze score"?

Why yet another readability library?

References

About

Releases

Packages

Languages

License

public-law/readability

Folders and files

Latest commit

History

Repository files navigation

Calculate the grade level of a text passage

Installation:

Let's try it out:

What's "reading level" and "cloze score"?

Why yet another readability library?

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages