Looks like the Great Firewall or something like it is preventing you from completely loading www.skritter.com because it is hosted on Google App Engine, which is periodically blocked. Try instead our mirror:

legacy.skritter.cn

This might also be caused by an internet filter, such as SafeEyes. If you have such a filter installed, try adding appspot.com to the list of allowed domains.

format for bulk list creation?

jww1066   July 16th, 2009 7:18p.m.

Hi, I wonder what it would take to get a list of the N most frequent characters into Skritter. There are several here, for example:

http://lingua.mtsu.edu/chinese-computing/statistics/

The page for creating our own lists would require me to enter them in one at a time which would be bad for my carpal tunnel.

I am a programmer and would be happy to convert their tab-delimited files or Excel files into whatever format you might need. Do you have a text file format (or XML, whatever) that you would accept for creating lists in bulk?

James

nick   July 16th, 2009 10:11p.m.

Just paste them in, one on each line. You can put just the characters, or you can enter tab-delimited characters, traditionals, pinyin, and definitions, with any of those other fields omissible or reorderable, although it'll probably work best in that order. You could also use comma+space delimited fields, but that'd mess up if any of the definitions had commas and spaces in them.

But probably putting just the characters would be easiest for that list, since we'll have most of them and you can't define any single character we don't have until we make its strokes. So just separate them by new lines.

You'll also have to put them in sections, which you can do like this:

* 1-100



...

* 101-200

...

Let me know if that works for you. Putting that many in might be slow, but be patient and it'll work eventually.

jww1066   July 16th, 2009 11:43p.m.

Works fine, although a little repetitive.

I have no idea about the traditional characters. I assume the selection lists that appear when I add items to the list are there for me to choose the right traditional character? In that case I am very much the wrong person to be making that decision.

I published one list as "1000 Most Frequent (Informative Text)"; let me know if there's anything wrong with it before I start making other lists like that one.

James

jww1066   July 16th, 2009 11:47p.m.

Actually I did notice something strange. That list has a lot of overlap with other stuff I've been studying; for some of those characters, you carried over the "known" state for the character and tone because I had already studied them, while for many other characters (中 for example) you say I don't know it or its tone. Why did it work for only some of them?

James

murrayjames   July 17th, 2009 5:29a.m.

Thanks James, great list.

nick   July 17th, 2009 8:39a.m.

Yup, looks good, James!

For the traditional variants: ideally, the correct ones will be selected. For single characters, anything will work, but if you put in the wrong one, people might get rarer variants. Usually defaulting to the first one is good, except in the case of radicals, where if you choose the first one, then you'll put in the radical form instead of the normal usage full form.

If someone with better knowledge of 繁体字 so desires, she can remix the list and select the best traditional versions.

For 中 and other apparently progress-less characters: I guess because you've only studied that character in other words, and not by itself, it's not displaying its progress. It is keeping track of the progress, though, so if you do add it, it'll be at some really long interval. Make sense?

jww1066   July 17th, 2009 9:12a.m.

There are a number of problems when I try to upload hundreds of characters at once.

When I get errors, the red counter increases but I don't see what the problem is. There's no indication of where the problems occurred or how to fix it.

The number of words it thinks it is adding to the list is also off; it thinks that section dividers are words.

The biggest problem is that, after adding them to the list, many of the characters have definition "_undefined". I would imagine this is related to the errors I encountered but couldn't resolve.

James

scott   July 17th, 2009 11:16a.m.

Hmm, odd bugs. For the first two, could you send me some example text? Preferably via email (scott@skritter.com) since I imagine it's a lot of text, and I will try to figure out what's going on. I'll be doing some optimizations with the list editor today so I'll also spend some time trying to get those fixed.

The undefineds are actually part of a change we made recently. We made it so that you can create lists with characters that we haven't built yet and so are unavailable for study. When you add from those lists or otherwise add those missing characters, they get added to your study like everything else but only start showing up once we put in the characters. This means you don't have to keep track of when missing characters become available; they will automatically start showing up when they do become available, and you can make your list complete now even if we don't completely support it. Sound good?

jww1066   July 17th, 2009 3:04p.m.

Well, that explains problems 1 and 3. It was showing "_undefined" for the definition, though, which made me think that it had failed to properly parse the record. Could you make it show a less frightening error, maybe "We don't have the stroke order for this character yet"? Also, what do you need to do to get the stroke order? Is that something I can track down for you?

The second problem seems to be a design issue; the count of characters also seems to include the number of *section lines.

James

jww1066   July 17th, 2009 3:07p.m.

Published "1001-2000th Most Frequent" - feedback welcome.

James

scott   July 17th, 2009 3:17p.m.

Ah okay, I just actually hooked up some logic which will display a more informative message throughout the site when you come across one of those characters, so that will go live soon.

As for building characters, it's not so much getting stroke order as it is building the recognizers for the strokes we don't have, which is a bit time consuming, and is unfortunately not something we can have users do (though you guys help us tune them when you draw the strokes we do have). Nick is in charge of that, and he's going to do the ones for Chinese and Japanese as soon as time permits. Not sure when that will be, though.

As for the section count, I can't seem to recreate it. The gray number shows how many sections are in the widget, and when I add some test sections it goes up while the other three numbers stay the same. Is it working differently for you?

There was another bug; trying to add words whose characters we didn't have didn't work, you would still get a red box, but I'm about to upload a fix for that too.

jww1066   July 17th, 2009 4:00p.m.

The section count problem is still there.

1. Click on "Add Words"
2. Enter
a line with a character
a line with * somesectionname
a line with another character
3. Click "Validate Words"
4. The button next to "Validate Words" displays "Add 3 words" even though you only added 2

This forum is now read only. Please go to Skritter Discourse Forum instead to start a new conversation!