Live Captioning with Dragon

A SetBC project to provide students with speech to text input from their teacher, using Dragon Naturally Speaking

Live Captioning with Dragon

Working Through Issues…

The issues we were having were mostly about the settings that the District had set up for security. For example the software could not save the voice files. It tried to save to the local drive and when it couldn’t, it would search for the network drive. It would never find it. The students in our District all work on the x drive and although they have access they cannot write on the x drive. The computer had to be set up as a personalized computer.

Our solution to this problem was worked out last week. After various phone calls/emails between Robert Palmquist, and Jeff (at Speechgear) and Igor Pavlina of our SD67 Tech department the settings were changed on our laptop. We had initially installed the software as personalized for user for that specific software but it to be installed as non-personalized software.

After working with it for one week we now know that it will save the voice files. Kathy Ryan from SETBC was in to help us yesterday. She trained her voice and the team was able to watch Interact-AS caption onto the laptop. Initially the captioning was 95% accurate. After four minutes the accuracy started to deteriorate rapidly to about 40% accuracy. The room was relatively quiet when she was talking. We are thinking that we might have some issues with the audio equipment. We have an email into Dan Paccioretti at Phonak to ask for his expertise.

Interact-AS saves each lesson as a voice and text file. We have decided to turn off the voice file for now as it takes up more memory. We will turn on this function if we need it.

Training and Retraining

We have been working to get the project up and running all through February. We had the Teacher train for a few minutes and encountered only about 25% accuracy. Our thinking was that she was using sophisticated science specific language and probably needed a longer training with more difficult words.  We now know that the training should not last more than 6 minutes. In that time the software learns how you produce the 42 sounds in the English language, it creates a statistical word grouping and temporal time stamps. Interact-AS will continue to improve over time.

Once a voice profile is compromised it could take months to get back to where it should be. It is better to retrain. You should never share a speech profile. We have retrained the profile and it is at 95% accuracy.

P1030936

The accuracy was also compromised because each time the computer turned on it automatically reverted back to its internal microphone. Each time we turn on the computer we are now turning on the external microphone settings.

During training we were using the Mylink from Phonak at its highest setting. It was pointed out to us that on the sound check it was as if we were shouting into Interact-As during the training. We are turning the setting to low when we are training. Then turning it up when we are receiving.

 

Robert Palmquist suggested that instead of using the USB Audio Adapter from Dragon we should purchase Pure Audio USB adapter at about $39. From his experience he felt it would work better. We have not purchased this yet as we are still working on getting things going with the equipment that we have.

We learned that you always have to plug in the USB adapter before you start up Interact-AS.

Right now we are finding that it takes 5 – 10 minutes for Interact-AS to start up and shut down. It should only take 5 seconds. We are also seeing two error messages. Robert has sent us a fix and Igor will debug the Software this morning.

We decided to trial Interact-AS in the classroom.  The Teacher’s user profile was there but the trained Teacher’s voice profile was gone. Igor arrived and completed the fix sent to us by Robert but the voice profile did not reappear. We have a call into Robert Palmquist to try and find the problem. This is the fourth time the Teacher has trained Interact-AS and her voice profile has disappeared.

P1030935

We are setting up a meeting with Robert today to try and figure out what is happening.

New Technology

Robert said he has an upgrade to Interact-AS, (406) that he could give us for free. It is for non-verbal speakers. The product only came out on Tues. March 4. The manual has not been created but he could train us if we want. We did not take him up on his offer yet. He also mentioned that a new microphone created for conversation is ready. The one that would work for this project is Z.MIC.SAM.HP. It costs around $395, additional headsets for $49 each. Currently all of our sound equipment is from Phonak.  We do not think we want to mix and match.

 

setup

Getting things going…

The project laptop has been imaged and now has access to District wireless services. It is Jan. 28, and we are finally starting to put all the pieces in place. Sometimes, as in all projects, stuff happens beyond our control.

This last week we spent working out the final ‘bugs’. We did quick ten minute training with the Teacher but found that we were not pleased with the accuracy of the speech recognition. With the use of our half day provided by SETBC the Teacher did additional training on her speech profile. This time training with a more difficult passage for a more sophisticated profile was needed for the scientific language in the classroom.

Last years’ experience taught us that moving too quickly to put the laptop with the speech recognition in front of an already challenged student only proved to further frustrate the student. We are moving slower this year but are determined to see it working optimally before we introduce it to the student.

First Meeting

Participants: Karen Bell, Arnold Moeliker, Jessa Arcuri, Sandra Cureatz, Dan Paccioretti, Andrea Devito, Jason Corday

Our first meeting was on the afternoon of Oct. 24. Two new people have come on board this year. We have a new District D/HH Teacher, Arnold Moeliker, and classroom Teacher, Jessa Arcuri. The first part of the meeting was to bring our new people up to speed. Then we watched the YouTube video clips 1-6 of Interact-AS for Educators by Auditory Sciences.  http://www.youtube.com/watch?v=gNtnCRALoMw&list=UU5UJmBev_7WpSZhW-Z29vTw

The new team members were amazed at what Interact-AS offers and wished they had it for other students in previous years.

Dan Paccioretti, Audiologist from Phonak Canada arrived at 2:30. He upgraded our Inspiro transmitter to Roger and brought us a new Roger Mylink. He spent some time advising us on the project and trained the OCFs and District D/HH Teacher on the new system. We patched the Roger Digital technology into our existing Frontrow infrared mounted speakers using a Phonak Digimaster X. We are now using Digital technology instead of FM. The sound is crystal clear. At first, we will trial this project only during the Science block to keep things simple. If we are successful the setup can move from class to class as we have mounted Frontrow Speakers in each classroom.

The Summerland Quest Service Club donated enough money to buy a laptop for the project. We asked the club to donate the laptop for two reasons. First we decided to use a laptop with optimal specs (i7 processor, 6-8 GB of Ram and 64 bit OS). Last year we found that the quality of laptop affected the accuracy of the captioning. We do not want to put more stress on students who are already having difficulty processing language in the classroom. Secondly we found that the SD67 Guest network was unreliable for long term use. By having a District laptop we could receive all of the perks enjoyed by other laptops in the District.

The Interact-AS arrived at the school on Oct.28 after our meeting so we have not had a chance to use it.

The project laptop is sitting at the board office, with the software. Our District changed to Windows 7 over the summer. This was a major undertaking for the Tech Dept. They are still working through District issues and have not gotten to our laptop. We are stalled on the project until they can find the time.

New Project for 2013-2014!

Live Captioning in Science 8

The SETBC proposal for this project was written by Karen Bell, SD#67 Support Teacher.

Project Summary: To make the Classroom Teacher’s auditory message a visual message using Interact-AS and Phonak Roger Digital technology, thereby assisting students with: Memory issues,

  • Motor issues,
  • Written output issues,
  • Organization issues,
  • Compromised audition,
  • Processing speed,
  • Limited English speaking capabilities.

This Project is an extension of Live Captioning with Dragon Special Project 2012-1013. Last year after networking with researchers in the field we believe we are closer to completing our goal and have found a better match to fit the needs of our students. This is our proposal to trial for 2013-2014:

  • When the Teacher is delivering instruction she will use: Interact-AS software, our School’s existing Frontrow infra-red sound field mounted speakers, and Phonak Roger Digital technology, to create captions on the D/HH Student’s laptop. The student with the compromised hearing will have the laptop in front of him as he needs real time captioning. The lecture can then be saved, emailed to other students in the class, including our SETBC student, who needs the notes and can hear the Teacher’s message without having to multi-task. This will allow the students to refer back to the Teacher’s oral instructions/directions for: assignments, clarification, review, and vocabulary development.  To network Interact-AS software we would need a reliable internet connection. If we have a strong connection to the internet we could use a free site like www.join.me.com to network last year’s SETBC project’s computers. Then all students could have a live capture in front of them. Once captured the information can be manipulated to reinforce the learning.

Why Captioning?

Within any classroom, students need to have the ability to hold relevant information in their brains. If they are challenged with memory issues, have motor issues (can’t get the notes down), or not hearing the information to begin with, all learning is compromised. As cognitive load is accelerated throughout the curriculum, this becomes a challenge. Teachers at the upper grades have more curriculum to cover and are delivering a large portion of the lesson using a lecture style, with perhaps fewer visual supports.  There are many opportunities in the grade 8 science curriculum where the students would benefit from live captioning while listening to a lesson in one of the following areas:

Project Evaluation:

The project will be successful if the lesson can be captured consistently with 90% accuracy.

Existing Equipment:

  • Existing Frontrow infrared sound field speakers (not required, but the new equipment from this project allows all students in the class to benefit from this project.)

Requested (and Approved) equipment:

Funding request from SETBC

  • -Interact-AS software  $795 US (one time cost) Supporting Success for Children with Hearing Loss
  •  and $159 yearly for upgrades and support (probably needed for the first year while doing the trial.) Yearly upgrades and support beyond the project will be covered by the District. Yearly upgrades and support are not necessary to keep Interact-AS running.

Support from other Partners:

Phonak Canada

  • To Switch to Phonak Roger Digital wireless from FM (Roger released June 24, 2013), the following will need to be completed: Phonak, will supply the equipment with the same arrangement as last year through SETBC.
  • -upgrade the current Inspiro Premium transmitter to Roger
  • -switch the MyLink receiver to Roger

Supplied by the Summerland Quest Service Club

  • -Laptop computer (i7 processor, 6-8 GB of Ram and 64 bit OS)

Requested (and approved) support:

  • A release day for the Classroom Teacher and CEA/Oral communication Facilitator to learn Interact-AS with some help from Kathy Ryan SETBC Consultant.
  • -Assistance with trouble shooting Technical setbacks from both the District and Specialist Support from an audiologist (Dan Paccioretti, Audiologist Phonak Canada).Monitoring the project on an ongoing basis to give suggestions/feedback to streamline the process.

History/Additional Information:

Our school was part of a pilot project two years ago to install Front Row To Go infra red soundfield systems with mounted speakers in each classroom. This captioning project could easily be trialed in other areas of the school.

This project started at an IEP meeting in June 2012.  We had three Hard of Hearing students entering French Immersion 6. Our initial thought was to provide Dragon Naturally Speaking captioning on the Smart Board. In Sept. 2012, we met as a group, the range of options were methodically examined. Features were demonstrated, ideas discarded and gradually a practical working model was created. This project, we soon discovered was beneficial to not only the Deaf and Hard of Hearing but to other special needs students. We decided to trial the project in a grade 7 English speaking classroom last year.

After encountering difficulties with accurate live capturing of the Teacher’s voice we realized that much of teaching is conversational. Dragon Naturally Speaking is made for dictation. The screen capture was one block of text without periods and capitals. (‘Auto Punctuation on’, was not reliable for revoicing.) We soon realized that Grade 7 students were not skilled at extracting text. We decided to have the CEA/Oral Communication Facilitator revoice the Teacher’s message. Although this worked it was tiring for the CEA/Oral Communication Facilitator and she was unavailable to help the other students in the classroom. To further complicate things we were having issues with FM interference. The live capture was not always very accurate.

In April 2013 we hit a crossroads. We heard through Dan Paccioretti, about a well-respected Audiologist named Karen Anderson (Minnesota) who was researching the same problem. She introduced us via email to Robert Palmquist the developer of Interact-AS. Robert explained that Interact-AS shares a core phonetic engine of dragon but the Interact-AS uses a different linguistic model. They are two different data sets. Interact-AS is designed for captioning conversation. After careful consideration and research we feel this software may be what we have been looking for to provide an accurate cost efficient method to live caption in the classroom.

In addition, Phonak Canada has just developed a new Roger Digital wireless system, (Roger released June 24, 2013), that will be replacing FM. Dan Paccioretti, Audiologist from Phonak Canada, suggested that we trial Roger as it would eliminate any channel and frequency issues. Phonak Canada is based in Mississauga, Ontario, and provides audiology equipment to this project. They have been following our progress via the blog.

As we have progressed through the project this year we have networked with many people around the world who are also grappling with this same problem. Working with SETBC enabled us to have a set framework, time, support and money to connect with other specialists in the field to work on a common problem. Cost efficient live captioning is needed. Technology is being developed rapidly. We need to find the right fit for our students so that they can reach their learning potential.

Individuals Involved with this project:

Karen Bell – Support Teacher

Arnold Moeliker – District Hearing Resource Teacher

Jill McCullum – Consulting Former Project D/HH Teacher

Jessa Arcuri – Classroom Teacher

Sandra Cureatz – Classroom Oral Communication Facilitator (OCF)

Erica McDowell – Oral Communication Facilitator/Computer Assistance

Kathy Ryan – SETBC Consultant

Jason Corday – Principal

Dan Paccioretti – Audiologist for Phonak Canada

Daniel Francisco – IT Manager

Igor Pavlina – IT Support

Anita Toneatto – Technology Helping Teacher/Blog manager

Students:

-Category D Student with memory issues due to a brain injury

-Student with severe language disability

-Cochlear Implant DHH student

**** Special thanks to Phonak Canada and the Summerland Quest Club who provided equipment for this project.

Thank you

We would like to thank all of the people who were involved with the project this year. As we progressed through the project, we networked with many people around the world who are also grappling with this same problem. Working with SETBC enabled us to have a set framework, time, and money to connect with other specialists in the field to work on a common problem. Cost efficient live captioning is needed. Technology is being developed rapidly. We need to find the right fit for our students so that they can reach their learning potential. We are already excited about continuing in a new direction next year.

Jill McCullum is retiring this year. She will not be able to be present at our “meetings” anymore but ensures us she is only a phone call away. Thank you Jill, it has been a pleasure working with you!

Project Recommendations…

Recommendations or Advice to Other Teams or Educators Conducting a Similar Project

  1. TEAM, there is no ‘I’ in team. This research could not have evolved without a skilled, dedicated, positive thinking Team. Consultation is essential but time to achieve this must be budgeted.
  2. Blogging, despite the time and effort it took was essential to maintaining a scope and sequence. It has enabled this research to expand and reach a wide ranging audience. We have gone from a local project to an international audience. It enabled us to network with other research savvy professionals.
  3. Use optimal sufficient hardware (computers with over recommended amount)12-5-2012_031
  4. Before trialing with students in the classroom setting. (as logical as this may seem we got caught up in the excitement and didn’t do due diligence)extend the trial period time until the bugs are worked out.
  5. Teacher(s) needs to be on board and tech savvy.12-5-2012_020
  6. Interference issues/ check channels and receivers if using dynamic FM. If using new Phonak ‘Roger’ digital wireless technology (release date June 2013) cross frequency issues will be eliminated. (we are evolving with this new system next year 2013-2014).
  7. **Keep it simple, use one computer and expand from there.
  8.  Dragon is good for dictation but we haven’t solved fluent speech to text output. We are moving the targets, now looking into new software designed for conversation.
  9. Dragon is difficult to read because it is one continuous paragraph on the screen. Someone has to edit it either the student or the teacher. Grade 7’s are not able to extract notes and eye fatigue is an issue for D/HH.
  10. Accuracy was never fully achieved, and never solved, but we are not ready to give up.
  11. Having the OCF re-voicing creates extra white noise in the classroom and takes away from other duties that the OCF orCEAwould normally perform in the classroom e.g. passing around the handheld microphone.
  12. We are not entirely eliminating the OCF component yet, but would ideally prefer the classroom message to originate from the classroom teacher.
  13. It is essential that you have access to appropriate tech support. Not a call center, not someone unfamiliar with School District protocol, not someone who lacks ability to be a team player, not a corporation focused on promoting a product vs achieving best student practice.
  14. Dollars $$$$, they say money makes the world go around and unfortunately this is also true of the Education Research World…try to have your funding stem from one source. This eliminates the team of thousands.
  15. Be prepared to make change.
  16. Have FUN, don’t forget the adage ‘you learn from your mistakes’.

 

Error Analysis of Dragon Naturally Speaking

The following data was compiled by Sandra, the OCF involved in this project.

  • Dictation was in the afternoon when students were not in the classroom
  • All Revoicing was during announcements while students were in class
  • All samples were 2 pages long
  • Auto punctuation was on
  • All tests were completed by the same OCF
Hardware(SetBC laptops used.  For laptop specs, see below) Dictated from Book(# word errors/total words)

% Inaccuracy

Revoiced by CEA/OCF(# word errors/total words)

% Inaccuracy

*Laptop 1, headphones plugged in by USB

32/937

3%

168/747

22%

**Laptop 2, headphones plugged in by USB

39/904

4%

54/749

7%

*Laptop 1 with wireless Phonak and MyLink

33/756

4%

94/632

15%

**Laptop 2 with wireless Phonak and MyLink

24/1098

2%

47/605

8%

*Laptop 1: Toshiba Satellite Pro, 3GB RAM, Intel Core Duo Processor, 32 bit, Windows 7 Pro, SP1

**Laptop 2: Toshiba Notebook, 4GB RAM, Intel i5 Processor, 32 bit, Windows 7 Pro, SP1

Conclusion: Although the Phonak equipment performed optimally in both trials it appears that the quality of the laptop determines how accurate the capture is.

Guest Post: Personal FM systems and voice to text software

Where is that interference coming from?

As a Special Education Teacher who is new to the world of channels and frequencies, I went through a huge learning curve this year. In this project we were trying to include not only Deaf/Hard of Hearing Specialists, but also general Special Education Teachers like myself. Dan Paccioretti, Audiologist Phonak Canada was a huge part of our team and has summed up the information he shared with us in his usual easy to understand explanations. Not everyone has the opportunity to have Dan personally come and explain why things are not working. Hopefully this will help teams who run into the same issues and don’t have someone like Dan to call.

Karen Bell

 

The following is a guest post by Dan Paccioretti:

Voice to text software applications such as Dragon Naturally Speaking or Interact-AS require excellent signal to noise ratios (SNRs) to achieve high levels of accuracy with transcriptions. The use of a personal FM system is one method to achieve delivery of speech at a high SNR to the computer running the software, while at the same time providing the hard of hearing student with direct access to the primary speaker through personal FM receivers coupled to their hearing aids.

In order to insure the highest quality of input from the FM system into the computer, care must be taken that the equipment has been set up correctly. The following items should be considered for a successful application of this technology.

Transmitter – an FM transmitter that allows for the use of a directional boom style microphone is preferred. A boom microphone will place the microphone in the best location to pick up the speaker’s voice as it is located within 1 to 2 inches of the mouth. This placement provides an excellent SNR due to the location and for the directional microphone which greatly reduces interference from background noise. In addition the student will also benefit from this position while listening with their FM receivers. The Phonak inspiro transmitter was selected as it met this criteria.

Receiver –  The FM receiver provides a wireless link between the teacher’s microphone and the computer running the software. To achieve this the FM receiver needs to have a coupling option that will allow a direct connection using the microphone jack on the computer. The Phonak MyLink is an FM receiver that has an audio output that can be used for headphones. This audio output can also be used to connect the receiver to the computer. The headphone jack on the MyLink is 2.5 mm  while a computer’s microphone jack is almost always a 3.5 mm. Using a 2.5 to 3.5 mm adaptor along with a 3.5 to 3.5 mm patch cable allows for a good connection.

 

 

 

 

 

 

Channel (frequency) to broadcast – A channel needs to be designated to broadcast the teacher’s voice to the MyLink attached to the computer for transcription and to the hard of hearing student to listen to with their personal FM receivers. The 216 MHz band is the recommended  band to use as this band has been allocated for the purpose by Industry Canada and thus is protected from use by other technologies. The student’s receivers and the MyLink can be set to receive the same signal from the FM transmitter so only one transmitter needs to be worn by the teacher. Care must be taken to insure that on one else in the school is using the same channel otherwise there will be interference between the two transmitters trying to broadcast on the same channel. Other possible users of the 216 MHz band are  Classroom amplification (Soundfields) systems that use FM. As there is no standardization in the industry each company will label their FM channels differently. You may need to consult channel charts or with the various manufactures to determine what the actual frequency is for each stated channel number.

The 216 MHz band has a limited number of channels that can be used without interference from other channels within the band. Care must be taken to choose channels that are separated far enough apart from each other so as not to cause unwanted interference. The chart below shows the Phonak channels on the 216 MHz band and provides information on potential interferences.


There are six channels on this list that can be used within the same room at the same time and not interfere with each other; these are N01, N09, N17, N64, N72 and N80. If more than 6 channels are needed within a particular school then creating a map of channels will be necessary to make sure that the required separation as noted on the chart is maintained.

Listening Check – each day that the equipment is going to be used, a listening check of the personal FM system should be completed to insure that there is no interference detected that might compromise the transcription of the software or reduce the student’s hearing accessibility. This is easily accomplished by attaching a set of headphones to the MyLink receiver and listening to the FM broadcast to make sure the sound is good and clear. The patching cable and adaptor should be inspected regularly to make sure they are in good order. It is a good idea to have backup connectors handy should one of the items become suspect.

Digital Wireless Transmission – a new type of wireless technology from Phonak is just being launched that does not use FM as the transmission signal and thus avoids many of the issues raised above. This digital wireless transmission runs on the 2.4 GHz (ISM) band. Using advanced transmission protocols of repeated broadcast of audio packets and frequency hopping this new transmission does not use traditional channels and is virtually interference free. The SNR achieved by this new technology is the highest ever achieved in a personal system. This will mean the  delivery of  cleaner audio signals to both the personal digital receivers and to the computer. For further information on this technology please see the Phonak website under the heading Roger.

https://www.phonakpro.com/ca/b2b/en/products/roger.html

Getting Closer

After insightful conversations with Karen Anderson PhD, Director Supporting Success for Children with Hearing Loss and Robert Palmquist developer of Interact-AS Software, we are feeling that our new direction will be to trial Interact-AS software. Karen and Robert have been valuable resources. Robert Palmquist describes the difference between Dragon Naturally Speaking and Interact AS in this way, “Dragon is designed for dictation of documents; Interact-AS is designed for captioning conversations. It’s kind of like having a screwdriver and a hammer. Both are very useful tools, but you need to use them in situations in which they’re designed to be used. You could use a screw driver to pound a nail, but it won’t work nearly as well as if you used the hammer. When you’re using Dragon to caption conversations, it’s the same thing. It works, but not as well as the tool that’s designed for that task. Use Dragon to dictate documents; use Interact-AS to caption conversational free speech.”
On April 11, 2013, we had a conference call meeting with Robert Palmquist and our Skilled Tech Team, to determine the viability of continuing our research. Specifically we needed to know:

  •  The hardware needed to run Interact-AS (minimum Interact-AS absolutely must operate on an Intel i5 or faster CPU, with 4 GB of RAM, and Windows XP, Vista, 7 or 8. Any unit even a couple of years old will probably not have the computational power needed. Optimal would be a laptop with i7 processor, 6-8 GB of Ram and 64 bit OS.) This is a very powerful program. Purchasing a computer with a faster processor and more memory would also allow more flexibility with adding other programs, such as Microsoft Office, Google Earth, Comic Life, Photostory, etc. so the computer’s only use isn’t just Interact-AS.
  • We need to devote a computer to this project. Robert explained that Interact-AS is compatible with other software but not with Dragon. It shares a core phonetic engine of dragon but the Interact-AS uses a different linguistic model. They are two different data sets.
  • Features of Interact-AS: it appeals to our population, captioning of Teacher’s voice, recognizes multiple speakers, only one license needed, not locked to one computer, extract text from voice dictation, highlight and revoice of text when needed e.g. at home, (if a Teacher is uncomfortable with this feature it can be turned off.) etc.

Today we are more committed than ever to trial Interact-AS Software. We are facing the hurdles of:

  1. Obtaining the appropriate laptop ($1708 including tax) Maybe borrow from SETBC
  2. Purchasing the Interact-AS software to trial CAD $58 + $18 shipping Viable School Purchase.
  3. If the computer is District owned then programs that are part of the school district image will be placed on the laptop as well as having a connection to the secured wireless.
  4. If we wanted to network (using https://join.me/ downloadable free version) to more than one laptop it would require using the internet. (Our District currently may not have the capacity because of other wireless devices.) For duration of the trial we will send the revoicing to only the Hearing Impaired Student and email a copy to the Hearing Students.

 

After contacting SETBC to borrow a computer with (i7 processor etc.) for the trial of Interact-AS in May, it was suggested that due to the short time this year, we should complete our current assessment of Dragon and the Phonak system. Then put forward another proposal next year to trial the new software (Captioning in the Classroom using Interact-AS.) with more current hardware. Dan assures us that everything we used this year from Phonak will still be appropriate for the new software trial. Phonak has just launched Roger a digital wireless technology that will replace FM in the near future. We will meet with Dan Paccioretti on April 24th to examine the Roger Technology in relation to next year’s trial with AS-Interact.