18thConnect Discussion: TypeWright Correction Questions and Answers http://18thconnect.org/forum/view_thread?thread=18 nineteenth-century studies online en-us 18thConnect http://18thconnect.org/assets/18th/sm_site_image-b17ff45b6d90f63929a59507b493e475.gif http://18thconnect.org 83 83 As a beginner in TypeWright, I ... http://18thconnect.org/forum/object?comment=67 As a beginner in TypeWright, I have lots of questions!&nbsp; As a folklore scholar, I have questions about the conventions and accepted/best practices for correcting these printed-text items.&nbsp; <br>Let us use this as a forum to discuss things like "Is it more helpful for the '@' to replace each garbled letter, or to substitute for the entire word?"&nbsp; In fact, look for that question as the first thread! Is it more helpful for the &apos;@&apos; ... http://18thconnect.org/forum/object?comment=68 Is it more helpful for the '@' to replace each garbled letter, or to substitute for the entire word? Question #1 From Angela Vietto:... http://18thconnect.org/forum/object?comment=71 Question #1 From Angela Vietto:<br>Will the crowd-sourced corrected texts make their way back to ECCO, or wherever they came from, or will they only be available here? Question #2 from Angela Vietto:... http://18thconnect.org/forum/object?comment=72 Question #2 from Angela Vietto:<br>When is the text "done"--are we having two or three people check a line before it's "100%"?&nbsp; I found a few lines that had been checked by someone but still needed obvious corrections. Question #3 from Angela Vietto:... http://18thconnect.org/forum/object?comment=73 Question #3 from Angela Vietto:<br>What conventions are we following?&nbsp; Question # 4 from Angela Vietto... http://18thconnect.org/forum/object?comment=74 Question # 4 from Angela Vietto:<br>If there is a space between a quotation mark or an exclamation point, are we honoring it?&nbsp; (I've done a few pages and went ahead and kept them.) Question #5 from Angela Vietto:... http://18thconnect.org/forum/object?comment=75 Question #5 from Angela Vietto:<br>Are we keeping catch-words at the bottom of the page or not? Question #6 from Angela Vietto:... http://18thconnect.org/forum/object?comment=76 Question #6 from Angela Vietto:<br>How [do we deal with] signature markings? Question #7 from Angela Vietto:... http://18thconnect.org/forum/object?comment=77 Question #7 from Angela Vietto:<br>What do we do with pages featuring illustrations?&nbsp; The Plenipotentiary has at least one. The above are questions from An... http://18thconnect.org/forum/object?comment=78 The above are questions from Angela Vietto posted as a comment on the 18thConnect news blog (see her original post at http://www.18thconnect.org/news/?p=417), many of which I have myself.&nbsp; <br><br>I will research the site-specific technical questions (#1 and #2) and post the answers when I get them.&nbsp; As for the editing questions, I look forward to the discussions! Angela Vietto, we have not forg... http://18thconnect.org/forum/object?comment=79 Angela Vietto, we have not forgotten your questions!&nbsp; Look for answers by the end of next week! Question #2 from Angela Vietto:... http://18thconnect.org/forum/object?comment=80 Question #2 from Angela Vietto:<br>Q- When is the text "done"--are we having two or three people check a line before it's "100%"? I found a few lines that had been checked by someone but still needed obvious corrections.<br> A- There is a three phase workflow for "completed" documents:<br> <span style="text-decoration: underline;">Step 1:</span> Once each page has been significantly edited, and a user reaches the last page in a document, a "mark this document complete" button will appear in the editing interface. To TypeWright, the text is "done" when a user reaches the end of the document and marks the text "complete."<br> <span style="text-decoration: underline;">Step 2:</span> Marking the document complete notifies our 18thConnect team. We then have our 18thConnect admin editors "check" the document by reviewing the fully corrected text document.<br><span style="text-decoration: underline;">Step 3:</span> If the document is indeed complete, the 18thConnect admin editor notifies the user and offers a text and XML version of the document. If the document is not complete, the 18thConnect admin editor marks the document "not complete," and the document is available for crowd-sourced editing once again. Question #1 from Angela ViettoQ... http://18thconnect.org/forum/object?comment=81 Question #1 from Angela Vietto<br>Q- Will the crowd-sourced corrected texts make their way back to ECCO, or wherever they came from, or will they only be available here?<br> A- Yes, the crowd-sourced corrected texts will make their way back to ECCO, improving ECCO's full text search capability. The corrected text will also be indexed by 18thConnect, and so improve our full text search capability, too. In this way, TypeWright work helps illuminate the dark corners that scholars were previously unable to interact with digitally. <br><br>Remember that you, the scholar working on the correction, will be offered digital versions of the text once it is deemed complete! Question #7 from Angela Vietto:... http://18thconnect.org/forum/object?comment=82 Question #7 from Angela Vietto:<br>Q-&nbsp; What do we do with pages featuring illustrations?<br>A-&nbsp; There are two actions that should be taken: <br>Step 1: Delete the "OCR" lines that TypeWright has identified in the image. Because TypeWright red boxes are determined by the Gale OCR output we received from ECCO, and because this determination is mechanical, parts of images are often read as "lines."&nbsp; (Please note that the "deleted" red box will remain on the page, but the "text" will be deleted from the line in the OCR output.)<br>Step 2: Report the page - this information will make it to the 18thConnect team, who will mark the document for further analysis by the eMOP team.&nbsp; This page can then be considered in the eMOP team's efforts to "teach" OCR machines how to identify images as images. Questions #3, #4, #5, and #6 fr... http://18thconnect.org/forum/object?comment=83 Questions #3, #4, #5, and #6 from Angela Vietto<br>Q- What conventions are we following? How do we decide whether or not to honor extra spaces? Are we keeping or deleting page numbers, page titles, catch-words at the bottom of the page? How do we deal with signature markings?<br> A- Many scholars have at least one idea about how to answer these questions! We hope to use the "<span class="ext_linklike" real_link="http://www.18thconnect.org/forum/view_thread?thread=18" title="External Link: http://www.18thconnect.org/forum/view_thread?thread=18">TypeWright Correction Questions and Answers</span>" discussion to exchange ideas and reach a consensus on many of these questions of style. (<span class="ext_linklike" real_link="http://www.18thconnect.org/forum/view_thread?thread=18" title="External Link: http://www.18thconnect.org/forum/view_thread?thread=18">Click here</span> to join the discussion!)<br> In the meantime, keep these two things in mind:<br>--Keep it simple, keep it searchable! TypeWright is meant to make texts "fully searchable" as well as to contribute to the preservation of our cultural heritage. Would a scholar be searching 18thConnect or ECCO for catchwords or page numbers? Would putting a space between punctuation and the end of a sentence hurt searchability?<br>--The "editor" chooses. Because TypeWright is, in one conception, a precursor to your development of a digital edition, then the you can choose which conventions to follow, as it is part of your responsibility as editor. On the other hand, if you are just lending your "human eyes" and hands to a project, then look to see how the other major contributors to the corrections have answered your questions in their corrections. I have a question about how lon... http://18thconnect.org/forum/object?comment=86 I have a question about how long it takes for the TypeWright team to "check" a text after it is completed and to send the person(s) who worked on it the text/xml version of the document. I am planning a 2014 January-Term class, and I'd like to make correcting a TypeWright text and preparing a (simple) digital edition of it part of the coursework. But it's only a three week course, so I am concerned about whether we will receive the xml document in time to work on the digital edition. Reply to geremyc:How exciting ... http://18thconnect.org/forum/object?comment=87 Reply to geremyc:<br>How exciting that you want your students to use TypeWright for their projects! We are in the midst of developing a formal workflow for evaluating and processing the completed documents from TypeWright. The current draft of this workflow places the turn-around time (from declared complete to decision) at one week. When we determine that an honest effort has been made to correct the OCR text, the major contributors will be offered the corrected text within that week in an e-mail which will ask in what format (text or XML) to forward the text. If only minimal or no corrections have been made, then the text will be returned to "TypeWright enabled" status for further correction, again within the week, and the contributor/corrector will receive an e-mail to that effect.<br> <br>I do hope that this timeline will allow the incorporation of TypeWright into your January term assignment. And please remember TypeWright for the classes you teach during the long terms!<br> Any consensus on keeping or del... http://18thconnect.org/forum/object?comment=325 Any consensus on keeping or deleting the line numbers?<br> Has the question about signatur... http://18thconnect.org/forum/object?comment=354 Has the question about signature marks (and catchwords) been answered?