Jump to content

Wikisource:Scriptorium/Archives/2025

From Wikisource


Request to enable the AI recognition of further diacritic letters

[edit]

Dear Wikisource Community & Tech Experts,

I am a contributor of the Northern Min language and the current AI software is unable to recognize several of the diacritic letters of the pages, for example, THIS PAGE.

(Index page is here).

At present contributors of the Northern Min language are needing to manually type out this text and each page takes approximately thirty minutes to complete. This means it would take 19,500 minutes (325 hours) to type out the entire New Testament of this language (650 pages approx).

If the AI can be enhanced to correctly recognize those texts, then each page would only take 5 seconds to complete and therefore the entire New Testament could be completed in 3,250 seconds (54 minutes), thus a time saving of 99%.

The AI software only needs to recognize these additional Special Characters (the ones highlighted in Red are currently unrecognizable by the AI software):

Acute: á í ú é ó á̤ é̤ ó̤ ṳ́ Á Í Ú É Ó Á̤ É̤ Ó̤ Ṳ́

Circumflex: â î û ê ô â̤ ê̤ ô̤ ṳ̂ Â Î Û Ê Ô Â̤ Ê̤ Ô̤ Ṳ̂

Double macron: a̿ i̿ u̿ e̿ o̿ a̤̿ e̤̿ o̤̿ ṳ̿ A̿ I̿ U̿ E̿ O̿ A̤̿ E̤̿ O̤̿ Ṳ̿

Macron: ā ī ū ē ō ā̤ ē̤ ō̤ ṳ̄ Ā Ī Ū Ē Ō Ā̤ Ē̤ Ō̤ Ṳ̄

Breve: ă ĭ ŭ ĕ ŏ ă̤ ĕ̤ ṳ̆ ŏ̤ Ă Ĭ Ŭ Ĕ Ŏ Ă̤ Ĕ̤ Ṳ̆ Ŏ̤

Grave: à ì ù è ò à̤ è̤ ṳ̀ ò̤ À Ì Ù È Ò À̤ È̤ Ṳ̀ Ò̤


Where does Wikisource derive its AI letter recognistion software from?
Which organization is responsible for developing/updating/enhancing this AI Recognition software?
How can they be contacted on this matter?

Any feedback would be welcome. --DaveZ123 (talk) 22:49, 1 January 2025 (UTC)[reply]

I may be too ignorant here, but the software is optical character recognition (OCR), which I don't think uses AI. There are multiple OCR scanning solutions here at Wikisource. If you open up a page to edit (e.g.), then you should see a button that says "Transcribe text" which has a drop-down menu that allows you to use three different OCR tools, which will yield different results. Are all of them equally bad at transcribing Northern Min? One thing that could help is changing the page's information so that the language is mnp (note that only admins can do this). Unfortunately, mnp is not on the list currently. So I think step one is to file a bug at phab: asking for mnp to be added to the list of languages available here. Then change the page information on various pages to mnp and see if that helps with the OCR scans. If it does, then the solution is more language changes to make transcription more efficient. If it does not, then we may need to come up with another solution. —Justin (koavf)TCM 11:17, 2 January 2025 (UTC)[reply]
I tried to transcribe using all three OCR Egnines (Google Cloud Vision OCR, Tesseract OCR, and Transkribus OCR). Of the three, Google Cloud Vision OCR is the most accurate and can transcribe macrons: ā ī ū ē ō Ā Ī Ū Ē Ō and also Breves: ă ĭ ŭ ĕ ŏ Ă Ĭ Ŭ Ĕ Ŏ.
However Double Macrons and diacritics with diaresis below, namely:
a̿ i̿ u̿ e̿ o̿ A̿ I̿ U̿ E̿ O̿
ā̤ ē̤ ō̤ ṳ̄ A̤̿ E̤̿ O̤̿ Ṳ̿
á̤ é̤ ó̤ ṳ́ Á̤ É̤ Ó̤ Ṳ́
ă̤ ĕ̤ ṳ̆ ŏ̤ Ă̤ Ĕ̤ Ṳ̆ Ŏ̤
à̤ è̤ ṳ̀ ò̤ À̤ È̤ Ṳ̀ Ò̤
â̤ ê̤ ô̤ ṳ̂ Â̤ Ê̤ Ô̤ Ṳ̂
a̤̿ e̤̿ o̤̿ ṳ̿ A̤̿ E̤̿ O̤̿ Ṳ̿
are still unable to be transcribed. The letters highlighted in red above are the letters that all three OCR's are currently unable to transcribe. --DaveZ123 (talk) 01:57, 5 January 2025 (UTC)[reply]
I have now added Phabricator Bug Request T383001 to request "mnp" to be added to the list of language drop down options.
Phabricator Bug Request T383002 is requesting for fifty eight new letters to be added to Google OCR. --DaveZ123 (talk) 02:31, 5 January 2025 (UTC)[reply]

Tech News: 2025-03

[edit]

MediaWiki message delivery 01:42, 14 January 2025 (UTC)[reply]

Launching! Join Us for Wiki Loves Ramadan 2025!

[edit]

Dear All,

We’re happy to announce the launch of Wiki Loves Ramadan 2025, an annual international campaign dedicated to celebrating and preserving Islamic cultures and history through the power of Wikipedia. As an active contributor to the Local Wikipedia, you are specially invited to participate in the launch.

This year’s campaign will be launched for you to join us write, edit, and improve articles that showcase the richness and diversity of Islamic traditions, history, and culture.

To get started, visit the campaign page for details, resources, and guidelines: Wiki Loves Ramadan 2025.

Add your community here, and organized Wiki Loves Ramadan 2025 in your local language.

Whether you’re a first-time editor or an experienced Wikipedian, your contributions matter. Together, we can ensure Islamic cultures and traditions are well-represented and accessible to all.

Feel free to invite your community and friends too. Kindly reach out if you have any questions or need support as you prepare to participate.

Let’s make Wiki Loves Ramadan 2025 a success!

For the International Team 12:08, 16 January 2025 (UTC)

AbuseFilter

[edit]

Hi, I just ran into an abuse filter while trying to blank some user talk page vandalism [6][7]. Please either blank the talk page yourself or speedy delete it (no meaningful content). I didn't ask for speedy deletion right away, because many projects never delete user talk pages.

Additionally I would suggest some changes to Special:AbuseFilter/13:

  • Going by the filter title, the action should just be "create" instead of "edit" (or you should change the title).
  • I recommend using user_rights instead of user_groups, that way you avoid false positives with crosswiki patrollers with global permissions.
  • If you want to keep the action "edit" instead of just page creations, I recommend adding namespace 2 and probably also 3 to the exempted name space, in order to avoid false positives like these [8][9]. Users should be allowed to blank their own user page or anything else in their own user space [10][11]

Pinging @Koavf because you appear to be the admin who did most abuse filter changes in the last ~2 years. Johannnes89 (talk) 07:22, 20 January 2025 (UTC)[reply]

@Johannnes89 Imagine my face while waking up to that notification PhilBrvni (talk) 09:01, 20 January 2025 (UTC)[reply]
Thanks. For what it's worth, the page is deleted now. —Justin (koavf)TCM 16:39, 20 January 2025 (UTC)[reply]
Done https://wikisource.org/wiki/Special:AbuseFilter/history/13/diff/prev/curJustin (koavf)TCM 16:40, 20 January 2025 (UTC)[reply]

Tech News: 2025-04

[edit]

MediaWiki message delivery 01:36, 21 January 2025 (UTC)[reply]

Universal Code of Conduct annual review: provide your comments on the UCoC and Enforcement Guidelines

[edit]

Please help translate to your language.

I am writing to you to let you know the annual review period for the Universal Code of Conduct and Enforcement Guidelines is open now. You can make suggestions for changes through 3 February 2025. This is the first step of several to be taken for the annual review. Read more information and find a conversation to join on the UCoC page on Meta.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review was planned and implemented by the U4C. For more information and the responsibilities of the U4C, you may review the U4C Charter.

Please share this information with other members in your community wherever else might be appropriate.

-- In cooperation with the U4C, Keegan (WMF) (talk) 01:11, 24 January 2025 (UTC)[reply]

Tech News: 2025-05

[edit]

MediaWiki message delivery 22:14, 27 January 2025 (UTC)[reply]

new Scriptorium head

[edit]

hi everyone, i just created Template:Scriptorium Header for the head part. kindly let me know how can we improve it further or use it here -- KuldeepBurjBhalaike (Talk) 06:37, 2 February 2025 (UTC)[reply]

Courtesy link to the current template: Template:Scriptoriumhead. I'm fine either way, but please don't abuse the small tag and once we've decided on a final layout, please redirect one template to the other. Thanks. —Justin (koavf)TCM 09:29, 2 February 2025 (UTC)[reply]
thanks for the quick response @Koavf, i'll update the header and will redirect old template when we are ready. -- KuldeepBurjBhalaike (Talk) 09:53, 2 February 2025 (UTC)[reply]
update: i have redirected the old template to new template KuldeepBurjBhalaike (Talk) 12:13, 16 February 2025 (UTC)[reply]

Global ban proposal for Shāntián Tàiláng

[edit]

Hello. This is to notify the community that there is an ongoing global ban proposal for User:Shāntián Tàiláng who has been active on this wiki. You are invited to participate at m:Requests for comment/Global ban for Shāntián Tàiláng. Wüstenspringmaus (talk) 12:52, 2 February 2025 (UTC)[reply]

Reminder: first part of the annual UCoC review closes soon

[edit]

Please help translate to your language.

This is a reminder that the first phase of the annual review period for the Universal Code of Conduct and Enforcement Guidelines will be closing soon. You can make suggestions for changes through the end of day, 3 February 2025. This is the first step of several to be taken for the annual review. Read more information and find a conversation to join on the UCoC page on Meta. After review of the feedback, proposals for updated text will be published on Meta in March for another round of community review.

Please share this information with other members in your community wherever else might be appropriate.

-- In cooperation with the U4C, Keegan (WMF) (talk) 00:49, 3 February 2025 (UTC)[reply]

Tech News: 2025-06

[edit]

MediaWiki message delivery 00:08, 4 February 2025 (UTC)[reply]

Tech News: 2025-07

[edit]

MediaWiki message delivery 00:11, 11 February 2025 (UTC)[reply]

Template:New texs

[edit]

Hello everyone! Sometime ago I found the Template:New texts that was created years ago, but never used in this project. Would the community believe that it would be interesting to add this template in the main page? Inspired by @Koavf:, I've experimented a little at the Sandbox, but I bet that someone here could create a more interesting visual for this template in the main page. Thanks, Erick Soares3 (talk) 22:17, 11 February 2025 (UTC)[reply]

Vandalism

[edit]

Good evening! Please block the user Drochun. Vandalism. 1nter pares (talk) 19:13, 13 February 2025 (UTC)[reply]

Courtesy link: User:Drochun. —Justin (koavf)TCM 19:19, 13 February 2025 (UTC)[reply]
DoneJustin (koavf)TCM 19:20, 13 February 2025 (UTC)[reply]
Thank you! 1nter pares (talk) 19:23, 13 February 2025 (UTC)[reply]

Tech News: 2025-08

[edit]

MediaWiki message delivery 21:16, 17 February 2025 (UTC)[reply]

Upcoming Language Community Meeting (Feb 28th, 14:00 UTC) and Newsletter

[edit]

Hello everyone!

An image symbolising multiple languages

We’re excited to announce that the next Language Community Meeting is happening soon, February 28th at 14:00 UTC! If you’d like to join, simply sign up on the wiki page.

This is a participant-driven meeting where we share updates on language-related projects, discuss technical challenges in language wikis, and collaborate on solutions. In our last meeting, we covered topics like developing language keyboards, creating the Moore Wikipedia, and updates from the language support track at Wiki Indaba.

Got a topic to share? Whether it’s a technical update from your project, a challenge you need help with, or a request for interpretation support, we’d love to hear from you! Feel free to reply to this message or add agenda items to the document here.

Also, we wanted to highlight that the sixth edition of the Language & Internationalization newsletter (January 2025) is available here: Wikimedia Language and Product Localization/Newsletter/2025/January. This newsletter provides updates from the October–December 2024 quarter on new feature development, improvements in various language-related technical projects and support efforts, details about community meetings, and ideas for contributing to projects. To stay updated, you can subscribe to the newsletter on its wiki page: Wikimedia Language and Product Localization/Newsletter.

We look forward to your ideas and participation at the language community meeting, see you there!


MediaWiki message delivery 08:29, 22 February 2025 (UTC)[reply]

Tech News: 2025-09

[edit]

MediaWiki message delivery 00:41, 25 February 2025 (UTC)[reply]

Tech News: 2025-10

[edit]

MediaWiki message delivery 02:30, 4 March 2025 (UTC)[reply]

Universal Code of Conduct annual review: proposed changes are available for comment

[edit]

Please help translate to your language.

I am writing to you to let you know that proposed changes to the Universal Code of Conduct (UCoC) Enforcement Guidelines and Universal Code of Conduct Coordinating Committee (U4C) Charter are open for review. You can provide feedback on suggested changes through the end of day on Tuesday, 18 March 2025. This is the second step in the annual review process, the final step will be community voting on the proposed changes. Read more information and find relevant links about the process on the UCoC annual review page on Meta.

The Universal Code of Conduct Coordinating Committee (U4C) is a global group dedicated to providing an equitable and consistent implementation of the UCoC. This annual review was planned and implemented by the U4C. For more information and the responsibilities of the U4C, you may review the U4C Charter.

Please share this information with other members in your community wherever else might be appropriate.

-- In cooperation with the U4C, Keegan (WMF) 18:51, 7 March 2025 (UTC)[reply]

@-jkb-, Ooswesthoesbes: This wiki may or may not need bureaucrats while not very big, so in order to keep them, please clear the backlogs and consider more nominations while I am open to this role with past experience on Commons. Or collective resignations of all bureaucrats? Either way, please act on.--Jusjih (talk) 03:54, 8 March 2025 (UTC)[reply]

Thanks for the ping. I'd be more than happy if you join the team, so be free to nominate yourself. --Ooswesthoesbes (talk) 06:50, 8 March 2025 (UTC)[reply]
You are very welcome and I just nominated myself per your encouragement.--Jusjih (talk) 17:40, 8 March 2025 (UTC)[reply]
Please archive finished ones as many closed cases clutter the page.--Jusjih (talk) 02:33, 19 March 2025 (UTC)[reply]
Done myself as the newest bureaucrat.--Jusjih (talk) 00:33, 22 March 2025 (UTC)[reply]

Tech News: 2025-11

[edit]

MediaWiki message delivery 23:09, 10 March 2025 (UTC)[reply]

FYI: Wikisource and Wikidata together: lessons from the Wikisource Conference

[edit]

https://diff.wikimedia.org/2025/03/12/wikisource-and-wikidata-together-lessons-from-the-wikisource-conference/Justin (koavf)TCM 07:50, 12 March 2025 (UTC)[reply]

Gaelic pages to import from enws

[edit]

Hi there! Could someone with importer rights take in en:Index:Gille dubh ciar-dhubh.pdf and its pages through Special:Import? Coming here because I believe there's no gaelic ws to host it. We're at some point in the near future going to delete that as out of scope (not in english), but we're waiting to see if first we can bring it where it belongs. (I'm an admin on enws.) Cheers, — Alien  3
3 3
12:58, 12 March 2025 (UTC)[reply]

DoneJustin (koavf)TCM 17:16, 12 March 2025 (UTC)[reply]
Also en:Index:Ioram na truaighe, le Issachari M'Aula do Thighearna Assinn.pdf, please. Same situation. -- Beardo (talk) 04:14, 13 March 2025 (UTC)[reply]
Done and proposed for deletion at en.ws. —Justin (koavf)TCM 20:49, 13 March 2025 (UTC)[reply]