Skritter | Chinese Breeze (Complete) Lists

Newer Topic Created 13 years ago Older Topic

Chinese Breeze (Complete) Lists

lechuan August 20th, 2012 12:05p.m.

I was just looking at the Chinese Breeze lists and noticed that they include only the words designated as vocab for each book.

I was thinking of making a full list (using every word in the book), but first wanted to check if anyone has already done this for any of them

Alan August 20th, 2012 1:03p.m.

This seems like a really good idea- I haven't managed to find these anywhere.

How are you thinking of doing this? Typing them in manually? Pleco OCR?

lechuan August 20th, 2012 1:15p.m.

I was thinking of doing Pleco OCR, manual corrections, and running it through zhtoolkit.com's Chinese word extractor.

Alan August 20th, 2012 1:47p.m.

If it works well I'll give it a go with the level I book I have, I assume you'll be doing the more advanced levels.

lechuan August 20th, 2012 2:01p.m.

@alanmd: That sounds great. How about we post the name of the title that we plan to work on to avoid duplication efforts?

I can start with Level 2, 我家的大雁飞走了 (Our geese have gone)

I was thinking of making the sections in the list correspond with the chapters in the reader.

Anyone else interested in joining in? :)

Alan August 20th, 2012 4:12p.m.

Do you think it matters if there is a lot of duplication across Chapters? If each Chapter's list is mostly the same it might be better to just order the list by first occurrence (if this is possible), and split it up into manageable parts. Does that tool let you filter out words that are already in a previous chapter?

I'll do level 1, Wrong, Wrong, Wrong! 错，错，错！I might take a while to finish though, so if anyone wants to jump in let me know and I can add you as an editor.

lechuan August 20th, 2012 5:25p.m.

Good point. I was hoping that Skritter might auto-detect and remove duplicates in the same list, but just tried it out and it doesn't.

If we stick the output of the Chinese Word Extractor in a text file (seperated by chapter), I can put together a program/script to prune duplicates from proceeding chapters.

Alan August 20th, 2012 5:30p.m.

I have written a few lines of Python 2.7 that will do the trick, if anyone wants them they can email me (email address in my profile).

I added a list for the first 2 chapters of Cuo Cuo Cuo: http://www.skritter.com/vocab/list?list=203872114
Taking the photos and OCRing is a bit of effort, I'll keep working on it as I go through the book (probably next month).

We should probably do any more detailed back and forth discussion on this by email to save annoying everyone else...

lechuan August 20th, 2012 7:10p.m.

Sounds good. Talk to you offline!

lechuan August 21st, 2012 8:35p.m.

Just as an update, so far 3 volunteers working on the following books:

Level 1: 错，错，错！ Wrong, Wrong, Wrong!
Level 1: 你最喜欢谁？ Whom Do You Like More?
Level 2: 我家的大雁飞走了 Our Geese Have Gone

nawor August 22nd, 2012 1:52a.m.

This sounds interesting. I may be able to help.

I scanned the book from Level 1 'I really want to find her' using an auto document feeder and ran OCR on it using ABBYY Finereader.
I then processed it with zhtoolkit's tool.
It gives a total of 1327 unique words and 1149 unique unknown.

Quite a few unique words right?

Is there any easy way to bring the zhtoolkit's vocabulary list into Skritter?

The OCR produced from ABBYY finereader isn't perfect. There are approximately 10 suspect characters per page. It would take some time to go through each page and check the suspects.

lechuan August 22nd, 2012 2:04a.m.

Hi nawar,

Sounds cool! We'll discuss offline.

---------------

In case the question comes up, no one will be publicly posting text or scans of these books. We are just vocab lists.

范博涵 August 27th, 2012 4:51p.m.

I started to scan and correct 'I really want to find her' using Pleco OCR. It scans the 汉字 perfectly, but does not handle punctuation or superscript very well. Two pages took almost 15 minutes. I bought all of the Chinese Breeze graded readers and would love to have the digitized version of the texts.

lechuan August 27th, 2012 5:43p.m.

Yes, it's really slow going. I'm spending about 15 minutes per page or two as well.

I can't post the digitized text for copyright reasons.

If you'd like, please send me an email at lechuan8@gmail.com and we can discuss.

lechuan August 27th, 2012 9:39p.m.

范博涵, just a note, nawor already started working on 'I really want to find her'.

Books in progress:

Level 1: 错，错，错！ Wrong, Wrong, Wrong!
Level 1: 你最喜欢谁？ Whom Do You Like More?
Level 1: 我一定要找到她... I really want to find Her...
Level 2: 我家的大雁飞走了 Our Geese Have Gone

Todo:

Level 1: 两个想上天的孩子 Two Children Seeking the Joy Bridge
Level 1: 向左向右 Left and Right: The Conjoined Brothers
Level 1: 我可以请你跳舞吗？ Can I Dance With You?
Level 2: 电脑公司的秘密 Secrets of a Computer Company
Level 2: 清风 Green Phoenix
Level 2: 如果没有你 If I Didn't Have You
Level 2：妈妈和儿子　Mother and Son
Level 2: 出事以后 After the Accident
Level 2: 一张旧画儿 An Old Painting
Level 3: 第三只眼睛 The Third Eye

Byzanti August 28th, 2012 5:30a.m.

Hey, those actually look somewhat interesting. When I used a textbook every single version from elementary up to advanced just had a variation of '留学生 meets Chinese person at bus stop', '留学生 goes to Chinese person's house and is very polite'. I didn't stick with them...

Zeppa August 28th, 2012 7:31a.m.

Green Phoenix is great - it's a famous Chinese ghost story. I would prefer to learn the few characters and words I don't know and concentrate on practising reading, rather than Skrittering every character.

范博涵 August 28th, 2012 3:06p.m.

lechuan, "I really want to find her" is the only Chinese Breeze book I have right now. The rest of the books are with my wife in Beijing.

Today, I managed to scan 22 pages in less than an hour. I will scan the rest tomorrow. After that, I will probably have to spend another hour to fix all the punctuation marks.

lechuan August 28th, 2012 4:24p.m.

范博涵, that sounds great, but I was mainly worried about duplicating effort. 'nawor' has already processed (and proofread) half the book.

This forum is now read only. Please go to Skritter Discourse Forum instead to start a new conversation!

create an account

recover an account

Chinese Breeze (Complete) Lists