Looks like the Great Firewall or something like it is preventing you from completely loading www.skritter.com because it is hosted on Google App Engine, which is periodically blocked. Try instead our mirror:

legacy.skritter.cn

This might also be caused by an internet filter, such as SafeEyes. If you have such a filter installed, try adding appspot.com to the list of allowed domains.

Does anyone know how to do this...?

Antimacassar   November 11th, 2011 8:54a.m.

It would be really useful to know how many individual characters a text uses (or even words, but I imagine this would be harder), i.e. not a word count as such, but rather in a similar way to the information the Character Writings Learned graph tells you. So, for example, if 应该 and 应当 were used it would acknowledge 3 characters.

It would be useful since then I could focus on texts that I could easily read without having to check the dictionary every 5 minutes.

Does anyone know any way to go about this without manually checking through a text (or part of it)?

Mandarinboy   November 11th, 2011 9:54a.m.

There are many tools out there that can do that. I am almost done with my own updated dot net version of such a tool where you can feed in your skritter exported vocabulary and scan a text to see how many words and or characters you do know in that text. I used that before to parse newspapers and movies to get better lists for word frequency. I do currently use it to get suitable newspaper articles for my daily reading. I do not like to have to look up more than max 10 words per article so it is useful to me in that sense.

Most of those tools, including mine, works in the way that you take longer text strings, e.g. 15 characters and search that against an database with Chinese words/phrases. If no match you remove the last character and repeat the search until you have found an match. If no words where in the string, the last character will be just an character. There can be much more logic to this as well to handle grammar pattern, proper names etc. but in general that is how to do the basic search. In my case i also in the output hypertext the words/characters I do not know with an dictionary definition so I can see that directly without having to look it up again. Mouse over and you have the translation. As always with brute force work like this there is no 100% accuracy but it is surprisingly good anyway. Works for me at least.

Just search Google for Chinese text segmenter or similar. I will post an update once my own skritter friendly version is done.

Antimacassar   November 11th, 2011 9:04p.m.

many thanks!

junglegirl   November 12th, 2011 12:16a.m.

@mandarin boy: your tool sounds great, do let us know when it's ready!

icebear   November 12th, 2011 12:10p.m.

Second that the tool you're working on sounds really helpful - I feel similarly about reading news articles - I hate when I end up looking up more words than sentences (particularly common in articles on subjects that are technical).

This forum is now read only. Please go to Skritter Discourse Forum instead to start a new conversation!