Dear CCC Members of COCOSDA : We had two meetings at Beijing : a dinner meeting on Oct 18 and the formal COCOSDA Workshop on Oct 21. Many CCC members were at Beijing and most of them attended both events, or most parts of the two events. They are : Nick, Wolfgang, Satoshi, Shuichi, Bruce, Justus, professor Fujisaki, professor Kurematsu, Joseph, Louis, and myself. I'm trying to summarize the major parts of the discussions below, including some discussions I made with a few individual CCC members. In most cases we didn't come up with conclusions, but simply exchanged opinions, partly because the limit of very short time, and partly because some CCC members were not present. I tend to have major decisions of COCOSDA to be made by all CCC members via e-mails. I'm not sure the summary below is accurate or complete. For those of you who were present, please feel free to correct, modify or add anything. Otherwise please feel free to provide your comments at the space given below, regardless of whether you were present at the meetings or not, either sending to the whole group of CCC members or to me only. I wish we can come up with good conclusions after some e-mail discussions. I look forward to hearing from you soon. Thanks. Sincerely yours Lin-shan Taipei ========================================================== Summary of Discussions at Meetings at Beijing 1. Scope of COCOSDA Limited to Spoken Languages? As the speech and language processing technologies are converging very fast, it becomes more and more difficult to define the boundary between the two. People very often ask the question if COCOSDA also provides or collects information about written/text corpora, and some other organizations are actually making efforts on corpora/evaluation for both spoken and written languages. COCOSDA has the tradition of focusing on spoken language processing only, and it seems that COCOSDA should keep this focus at least at the moment. This means COCOSDA's interests include those written language resources and evaluations directly related to speech, such as those for developing language models, lexicons, speech understanding and spoken dialogue systems, speech translation, etc., but not all. Covering all speech and language is probably too much for COCOSDA at the moment. On the other hand, it may be a good idea to develop good coordination relationship with some other organization working on written languages. For example, if COCOSDA is collecting information about "spoken" local languages in a region and another organization is collecting information about "written" local languages in the same region, some duplicated efforts should be avoided, and when these two sets of information are put together, they may form a complete picture of local languages in that region. 2. Scope of COCOSDA Limited to Corpora and Evaluation? Corpora and Evaluation are the two most important key elements for the "infrastructure" of speech technology developments. But it seems more elements are emerging, such as software tools, software sharing, training programs, meetings, events, projects and relevant information sources. COCOSDA has been working on some of them. The topic domain of "corpus annotation tools" is an example and the efforts of our rapporteurs in collecting relevant information is another example. Some other organizations are making similar efforts as well. All these elements may fall in the general term of "spoken language technology development resources". It may be reasonable for COCOSDA to consider these elements as well, but limited to those related to speech only at the moment. 3. Where are the Corpora? Many people ask : where can we find desired corpora? Can COCOSDA give us an answer? Today there are quite several data centers. LDC, ELRA are good examples, and one for Japanese has been established as well. People who are looking for corpora should go to those data centers. COCOSDA is not a data center, but instead is an organization for promotion and coordination. The future Website of COCOSDA should have all relevant information, including links to the Websites of all data centers on the world. 4. Relationship with other International/Regional Organizations? Today there are quite several international/regional organizations with functionalities relevant to COCOSDA. COCOSDA should develop reasonable working relationships with them. ISCA has offered the possibility of COCOSDA being a "Special Interest Group" of ISCA. Julia Hirshberg told us the discussions at a meeting at Hong Kong. It seems that COCOSDA may try to develop "relatively loose collaboration relationships" but "relatively efficient coordination relationships" with these organizations. COCOSDA may consider to ask for financial/organizational supports from some organizations, under the condition of keeping COCOSDA completely independent at the moment. If a small "coordination committee" composed of representatives of these organizations is to be formed to promote the coordination among them, COCOSDA will be willing to participate in such a committee. 5. Guideline and Procedures for Establishing a New Topic Domain? COCOSDA has been structured with "Regions" and "Topic Domains" as two-dimensions of its functionalities. New topic domains are expected to be established in the future. Although there are many important areas in spoken language processing, the distinct features of COCOSDA actually make only a subset of these areas appropriate for a topic domain of COCOSDA. A very good guideline for defining a reasonable COCOSDA topic domain was drafted by Bruce before. Such a guideline will be very important for future development of COCOSDA. Bruce is going to revise that draft considering the discussions. The guideline will then be discussed by all CCC members electronically and finalized and posted on the Website. All future new topic domains will then be selected following the guidelines. Once a topic domain is considered appropriate and important, a good rapporteur willing to do the work needs to be identified and approved by the existing CCC. The identified rapporteur should then write a short statement describing the topic domain, post it on the Website, and start to organize the work. The topic domain is then formally established. 6. A Possible New Topic Domain on Cross-Language Processing (Or Some Other Title)? This may include parallel or comparable multi-gual spoken language corpora (either on similar topics or contents, or with sentence-by-sentence correspondence), the annotation for them, and evaluation of relevant applications such as cross-language indexing or retrieval, speech translation, etc. It is generally agreed that this is an important area appropriate for a COCOSDA topic domain, but slightly different opinions appeared regarding its scope. For example, speech translation evaluation itself may be an important topic domain, or it may be included in the existing topic domain of evaluation of speech understanding/dialogue systems. Wolfgang will consider the latter possibility. Also, the corpora and annotation for cross-language processing itself may be a good topic domain. Further comments/suggestions are invited. 7. A Possible New Topic Domain on Emotional Speech Corpora (Or Some Other Title)? Significant research activities on this area have been developed in recent years. It could be a good new topic domain. It was mentioned this may also become a Special Interest Group of ISCA. It was also mentioned that COCOSDA should try to avoid duplicated efforts with ISCA, but should consider good partition and coordination. For example, the Special Interest Group of ISCA may focus on processing technology, while the topic domain of COCOSDA may focus on corpora, annotation and evaluation, etc. Further comments/suggestions are invited. 8. Website Construction? The main Website will be maintained by Nick. Each topic domain and each region will construct its own Website, linked to the main Website. Right now a few rapporteurs already established their Websites. It was agreed that there should be some common format up to some degree for the Websites for each region/topic domain to follow, such that it will be easier for people to access their desired information, while leaving enough space for respective region/topic domain to develop its own distinct features. It was suggested that the rapporteurs plus Nick and Lin-shan should discuss the details electronically after returning from Beijing, and Justus and Nick will initiate the discussions with some initial proposal. 9. Other subjects/issues which may be missing on the above? ------------------------------------------------------------- Your Comments/Modifications/Suggestions Invited 1. Scope of COCOSDA Limited to Spoken Languages? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 2. Scope of COCOSDA Limited to Corpora and Evaluation? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 3. Where are the Corpora? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 4. Relationship with other International/Regional Organizations? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 5. Guideline and Procedures for Establishing a New Topic Domain? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 6. A Possible New Topic Domain on Cross-Language Processing (Or Some Other Title)? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 7. A Possible New Topic Domain on Emotional Speech Corpora (Or Some Other Title)? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 8. Website Construction? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ 9. Other subjects/issues which may be missing on the above? ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ <==============> EOF <==============>