Looks like the Great Firewall or something like it is preventing you from completely loading www.skritter.com because it is hosted on Google App Engine, which is periodically blocked. Try instead our mirror:

legacy.skritter.cn

This might also be caused by an internet filter, such as SafeEyes. If you have such a filter installed, try adding appspot.com to the list of allowed domains.

Movie Frequency list

atdlouis   June 10th, 2011 7:26p.m.

Hi Mandarin Boy,

I just wanted to thank you for putting up that Movie Frequency list of yours. I know you were also working on a news frequency word list - any idea when that will go up? That would be the most useful to me, although I will definitely be using the movie list too.

I also did a vocab search for "frequency," to see other frequency lists. But your movie list didn't show up - does anyone know if this is because it hasn't been indexed yet, or is this a flaw in vocab list search?

Thanks!

Alex

Mandarinboy   June 10th, 2011 7:50p.m.

I am still scanning the news. This takes longer time than the movies since I have to crawl the sites and get the pages. I am now up to some 15.000.000 + words but would like to get up to 60.000.000 before i create the lists. Probably about two more weeks i guess. The old lists that are made by others earlier are usually just scanning one newspaper and less than 5.000.000 words. I try to scan 8 different papers and all categories to get a wider range of words. Also thinking about breaking down some of the words in word categories such as verbs, proper names, substantive etc. I for one like to pick up more verbs.

studygood   June 12th, 2011 10:41a.m.

Would it be possible to compile a Chengyu 成语 frequency list? I would love to find out the 500 most frequent Chengyu.

Mandarinboy   June 12th, 2011 7:43p.m.

In theory yes, I just need an nice 成语 list to use for the parsing. I think there are some 5-20.000 成语 but most of them are very rare. I am working on an sentence harvesting module so if i find a large enough 成语 list I will try to do that.

Mandarinboy   June 12th, 2011 8:29p.m.

I just got a list with 30.000 Chengyu. I can use that list and scan Internet for frequency usage of those. Probably I can start that from next week. Need to compile the list, fix it up etc. first.

A fun thing with the current Internet scanning is that there are many words that reflects today's topics that I where not expecting;-) E.g.on place 25 of most used words is actually "white/grey hair". Aging seems to be a popular topic even in China. Naturally there is also a lot of "new" words such as Facebook, twitter,blog etc. The list will be rather different from the ones that where done early 2000.

Foo Choo Choon   June 13th, 2011 5:32p.m.

A chengyu frequency list would be extremely useful, in particular if details on the sources were provided.

Mandarinboy   June 13th, 2011 7:53p.m.

I have written an search algorithm that will give me the chengyu and the context. In the database i do have some info about the pinyin translation and also some about the usage. Can't use that in an list at skritter but maybe make a frequency list of chengyu found in daily newspapers and an on line dictionary on my own site to search for more details about them or something similar. This will however wait until all the other searches are done.

studygood   June 15th, 2011 12:07p.m.

great

look forward to seeing it

please keep us updated

This forum is now read only. Please go to Skritter Discourse Forum instead to start a new conversation!