ajout infos des archives de proposition wiki
This commit is contained in:
parent
7665f1d99c
commit
9bd1fddd8a
9 changed files with 2517 additions and 27 deletions
|
@ -5,7 +5,7 @@ jour ou de traductions, et publier des suggestions sur Mastodon pour encourager
|
|||
|
||||
## Vue d'ensemble
|
||||
|
||||
Le projet comprend neuf scripts principaux :
|
||||
Le projet comprend dix scripts principaux :
|
||||
|
||||
1. **wiki_compare.py** : Récupère les 50 clés OSM les plus utilisées, compare leurs pages wiki en anglais et en
|
||||
français, et identifie celles qui ont besoin de mises à jour.
|
||||
|
@ -15,19 +15,21 @@ Le projet comprend neuf scripts principaux :
|
|||
suggestion de traduction sur Mastodon.
|
||||
4. **propose_translation.py** : Sélectionne une page wiki (par défaut la première) et utilise Ollama avec le modèle
|
||||
"mistral:7b" pour proposer une traduction, qui est sauvegardée dans le fichier outdated_pages.json.
|
||||
5. **detect_suspicious_deletions.py** : Analyse les changements récents du wiki OSM pour détecter les suppressions
|
||||
5. **suggest_grammar_improvements.py** : Sélectionne une page wiki française (par défaut la première) et utilise grammalecte
|
||||
pour vérifier la grammaire et proposer des améliorations, qui sont sauvegardées dans le fichier outdated_pages.json.
|
||||
6. **detect_suspicious_deletions.py** : Analyse les changements récents du wiki OSM pour détecter les suppressions
|
||||
suspectes (plus de 20 caractères) et les enregistre dans un fichier JSON pour affichage sur le site web.
|
||||
6. **fetch_proposals.py** : Récupère les propositions de tags OSM en cours de vote et les propositions récemment modifiées,
|
||||
7. **fetch_proposals.py** : Récupère les propositions de tags OSM en cours de vote et les propositions récemment modifiées,
|
||||
et les enregistre dans un fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure
|
||||
pour éviter des requêtes trop fréquentes au serveur wiki.
|
||||
7. **find_untranslated_french_pages.py** : Identifie les pages wiki françaises qui n'ont pas de traduction en anglais
|
||||
8. **find_untranslated_french_pages.py** : Identifie les pages wiki françaises qui n'ont pas de traduction en anglais
|
||||
et les enregistre dans un fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure.
|
||||
8. **find_pages_unavailable_in_french.py** : Scrape la catégorie des pages non disponibles en français, gère la pagination
|
||||
9. **find_pages_unavailable_in_french.py** : Scrape la catégorie des pages non disponibles en français, gère la pagination
|
||||
pour récupérer toutes les pages, les groupe par préfixe de langue et priorise les pages commençant par "En:". Les données
|
||||
sont mises en cache pendant une heure.
|
||||
9. **fetch_osm_fr_groups.py** : Récupère les informations sur les groupes de travail et les groupes locaux d'OSM-FR
|
||||
depuis la section #Pages_des_groupes_locaux et les enregistre dans un fichier JSON pour affichage sur le site web.
|
||||
Les données sont mises en cache pendant une heure.
|
||||
10. **fetch_osm_fr_groups.py** : Récupère les informations sur les groupes de travail et les groupes locaux d'OSM-FR
|
||||
depuis la section #Pages_des_groupes_locaux et les enregistre dans un fichier JSON pour affichage sur le site web.
|
||||
Les données sont mises en cache pendant une heure.
|
||||
|
||||
## Installation
|
||||
|
||||
|
@ -53,6 +55,12 @@ Pour utiliser le script propose_translation.py, vous devez également installer
|
|||
ollama pull mistral:7b
|
||||
```
|
||||
|
||||
Pour utiliser le script suggest_grammar_improvements.py, vous devez installer grammalecte :
|
||||
|
||||
```bash
|
||||
pip install grammalecte
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Mastodon API
|
||||
|
@ -144,6 +152,28 @@ ollama pull mistral:7b
|
|||
|
||||
Le script enregistre la traduction proposée dans la propriété "proposed_translation" de l'entrée correspondante dans le fichier outdated_pages.json.
|
||||
|
||||
### Suggérer des améliorations grammaticales avec grammalecte
|
||||
|
||||
Pour sélectionner une page wiki française (par défaut la première avec une version française) et générer des suggestions d'amélioration grammaticale avec grammalecte :
|
||||
|
||||
```bash
|
||||
./suggest_grammar_improvements.py
|
||||
```
|
||||
|
||||
Pour vérifier une page spécifique en utilisant sa clé :
|
||||
|
||||
```bash
|
||||
./suggest_grammar_improvements.py --page type
|
||||
```
|
||||
|
||||
Note : Ce script nécessite que grammalecte soit installé. Pour l'installer, exécutez :
|
||||
|
||||
```bash
|
||||
pip install grammalecte
|
||||
```
|
||||
|
||||
Le script enregistre les suggestions grammaticales dans la propriété "grammar_suggestions" de l'entrée correspondante dans le fichier outdated_pages.json. Ces suggestions sont ensuite utilisées par Symfony dans le template pour afficher des corrections possibles sur la version française de la page dans une section dédiée.
|
||||
|
||||
### Détecter les suppressions suspectes
|
||||
|
||||
Pour analyser les changements récents du wiki OSM et détecter les suppressions suspectes :
|
||||
|
@ -302,7 +332,28 @@ Contient des informations détaillées sur les pages qui ont besoin de mises à
|
|||
"date_diff": 491,
|
||||
"word_diff": 700,
|
||||
"section_diff": 2,
|
||||
"priority": 250.5
|
||||
"priority": 250.5,
|
||||
"proposed_translation": "Texte de la traduction proposée...",
|
||||
"grammar_suggestions": [
|
||||
{
|
||||
"paragraph": 1,
|
||||
"start": 45,
|
||||
"end": 52,
|
||||
"type": "ACCORD",
|
||||
"message": "Accord avec le nom : « bâtiments » est masculin pluriel.",
|
||||
"suggestions": ["grands"],
|
||||
"context": "...les grandes bâtiments de la ville..."
|
||||
},
|
||||
{
|
||||
"paragraph": 3,
|
||||
"start": 120,
|
||||
"end": 128,
|
||||
"type": "CONJUGAISON",
|
||||
"message": "Conjugaison erronée. Accord avec « ils ».",
|
||||
"suggestions": ["peuvent"],
|
||||
"context": "...les bâtiments peut être classés..."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"key": "amenity",
|
||||
|
|
567
wiki_compare/archived_proposals.json
Normal file
567
wiki_compare/archived_proposals.json
Normal file
|
@ -0,0 +1,567 @@
|
|||
{
|
||||
"last_updated": "2025-08-31T12:12:58.757275",
|
||||
"proposals": [
|
||||
{
|
||||
"title": "Proposal:4WD Only",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:4WD_Only",
|
||||
"last_modified": "30 April 2023",
|
||||
"proposer": "Gaffa",
|
||||
"section_count": 0,
|
||||
"link_count": 16,
|
||||
"word_count": 152,
|
||||
"votes": {
|
||||
"approve": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"oppose": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"abstain": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
}
|
||||
},
|
||||
"total_votes": 0,
|
||||
"approve_percentage": 0,
|
||||
"oppose_percentage": 0,
|
||||
"abstain_percentage": 0
|
||||
},
|
||||
{
|
||||
"title": "Proposal:Access: name space",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Access:_name_space",
|
||||
"last_modified": "30 April 2023",
|
||||
"proposer": "Hawke",
|
||||
"section_count": 0,
|
||||
"link_count": 11,
|
||||
"word_count": 109,
|
||||
"votes": {
|
||||
"approve": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"oppose": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"abstain": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
}
|
||||
},
|
||||
"total_votes": 0,
|
||||
"approve_percentage": 0,
|
||||
"oppose_percentage": 0,
|
||||
"abstain_percentage": 0
|
||||
},
|
||||
{
|
||||
"title": "Proposal:Add ability to specify ordering-only phone number, sms-only phone numbers and related tags",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Add_ability_to_specify_ordering-only_phone_number,_sms-only_phone_numbers_and_related_tags",
|
||||
"last_modified": "8 June 2025",
|
||||
"proposer": "JOlshefsky",
|
||||
"section_count": 10,
|
||||
"link_count": 104,
|
||||
"word_count": 808,
|
||||
"votes": {
|
||||
"approve": {
|
||||
"count": 11,
|
||||
"users": [
|
||||
{
|
||||
"username": "JOlshefsky",
|
||||
"date": "10:52, 16 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Mueschel",
|
||||
"date": "09:40, 18 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Chris2map",
|
||||
"date": "17:11, 18 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Broiledpeas",
|
||||
"date": "22:20, 18 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Nospam2005",
|
||||
"date": "19:59, 21 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "GanderPL",
|
||||
"date": "08:50, 22 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "EneaSuper",
|
||||
"date": "14:04, 22 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Uboot",
|
||||
"date": "07:20, 25 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Jean-Baptiste",
|
||||
"date": "12:40, 28 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Emmanuel",
|
||||
"date": "18:14, 28 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Hocu",
|
||||
"date": "06:06, 31 July 2024"
|
||||
}
|
||||
]
|
||||
},
|
||||
"oppose": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"abstain": {
|
||||
"count": 4,
|
||||
"users": [
|
||||
{
|
||||
"username": "Woodpeck",
|
||||
"date": "21:15, 19 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Hedaja",
|
||||
"date": "15:41, 21 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "501ghost",
|
||||
"date": "06:36, 24 July 2024"
|
||||
},
|
||||
{
|
||||
"username": "Nadjita",
|
||||
"date": "08:33, 26 July 2024"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"total_votes": 15,
|
||||
"approve_percentage": 73.3,
|
||||
"oppose_percentage": 0.0,
|
||||
"abstain_percentage": 26.7
|
||||
},
|
||||
{
|
||||
"title": "Proposal:Add strolling to sac scale and some further refinements",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Add_strolling_to_sac_scale_and_some_further_refinements",
|
||||
"last_modified": "5 November 2024",
|
||||
"proposer": null,
|
||||
"section_count": 20,
|
||||
"link_count": 268,
|
||||
"word_count": 2329,
|
||||
"votes": {
|
||||
"approve": {
|
||||
"count": 21,
|
||||
"users": [
|
||||
{
|
||||
"username": "Supsup",
|
||||
"date": "18:21, 15 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Cick0",
|
||||
"date": "21:43, 15 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Fizzie41",
|
||||
"date": "21:47, 15 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Segubi",
|
||||
"date": null
|
||||
},
|
||||
{
|
||||
"username": "rhhs",
|
||||
"date": "06:43, 16 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Alan",
|
||||
"date": "07:39, 16 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "VojtaFilip",
|
||||
"date": "08:00, 16 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Adamfranco",
|
||||
"date": "13:21, 16 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Yvecai",
|
||||
"date": "05:07, 18 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Woazboat",
|
||||
"date": "12:15, 18 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "julcnx",
|
||||
"date": "15:16, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "JIDB",
|
||||
"date": "17:17, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Pb07",
|
||||
"date": "18:04, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Lumikeiju",
|
||||
"date": "18:46, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Heilbron",
|
||||
"date": "20:43, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Aighes",
|
||||
"date": "21:30, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Crodthauser",
|
||||
"date": "21:55, 20 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Adiatmad",
|
||||
"date": "02:26, 21 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Jonathan",
|
||||
"date": "10:46, 21 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "mahau",
|
||||
"date": "17:34, 21 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "EneaSuper",
|
||||
"date": "13:43, 27 October 2024"
|
||||
}
|
||||
]
|
||||
},
|
||||
"oppose": {
|
||||
"count": 4,
|
||||
"users": [
|
||||
{
|
||||
"username": "chris66",
|
||||
"date": "06:22, 16 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Skyper",
|
||||
"date": "11:55, 16 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Nop",
|
||||
"date": "06:41, 17 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Fabi2",
|
||||
"date": "19:27, 20 October 2024"
|
||||
}
|
||||
]
|
||||
},
|
||||
"abstain": {
|
||||
"count": 2,
|
||||
"users": [
|
||||
{
|
||||
"username": "Chris2map",
|
||||
"date": "17:39, 17 October 2024"
|
||||
},
|
||||
{
|
||||
"username": "Nospam2005",
|
||||
"date": "13:40, 20 October 2024"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"total_votes": 27,
|
||||
"approve_percentage": 77.8,
|
||||
"oppose_percentage": 14.8,
|
||||
"abstain_percentage": 7.4
|
||||
}
|
||||
],
|
||||
"statistics": {
|
||||
"total_proposals": 4,
|
||||
"total_votes": 42,
|
||||
"avg_votes_per_proposal": 10.5,
|
||||
"unique_voters": 39,
|
||||
"top_voters": [
|
||||
{
|
||||
"username": "Chris2map",
|
||||
"total": 2,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 1
|
||||
},
|
||||
{
|
||||
"username": "Nospam2005",
|
||||
"total": 2,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 1
|
||||
},
|
||||
{
|
||||
"username": "EneaSuper",
|
||||
"total": 2,
|
||||
"approve": 2,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "JOlshefsky",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Mueschel",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Broiledpeas",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "GanderPL",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Uboot",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Jean-Baptiste",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Emmanuel",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Hocu",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Woodpeck",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 0,
|
||||
"abstain": 1
|
||||
},
|
||||
{
|
||||
"username": "Hedaja",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 0,
|
||||
"abstain": 1
|
||||
},
|
||||
{
|
||||
"username": "501ghost",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 0,
|
||||
"abstain": 1
|
||||
},
|
||||
{
|
||||
"username": "Nadjita",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 0,
|
||||
"abstain": 1
|
||||
},
|
||||
{
|
||||
"username": "Supsup",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Cick0",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Fizzie41",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Segubi",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "rhhs",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Alan",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "VojtaFilip",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Adamfranco",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Yvecai",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Woazboat",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "julcnx",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "JIDB",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Pb07",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Lumikeiju",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Heilbron",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Aighes",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Crodthauser",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Adiatmad",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Jonathan",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "mahau",
|
||||
"total": 1,
|
||||
"approve": 1,
|
||||
"oppose": 0,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "chris66",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 1,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Skyper",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 1,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Nop",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 1,
|
||||
"abstain": 0
|
||||
},
|
||||
{
|
||||
"username": "Fabi2",
|
||||
"total": 1,
|
||||
"approve": 0,
|
||||
"oppose": 1,
|
||||
"abstain": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
66
wiki_compare/archived_proposals_new.json
Normal file
66
wiki_compare/archived_proposals_new.json
Normal file
|
@ -0,0 +1,66 @@
|
|||
{
|
||||
"last_updated": "2025-08-31T12:11:31.163320",
|
||||
"proposals": [
|
||||
{
|
||||
"title": "Proposal:4WD Only",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:4WD_Only",
|
||||
"last_modified": "30 April 2023",
|
||||
"proposer": "Gaffa",
|
||||
"section_count": 0,
|
||||
"link_count": 16,
|
||||
"word_count": 152,
|
||||
"votes": {
|
||||
"approve": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"oppose": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"abstain": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
}
|
||||
},
|
||||
"total_votes": 0,
|
||||
"approve_percentage": 0,
|
||||
"oppose_percentage": 0,
|
||||
"abstain_percentage": 0
|
||||
},
|
||||
{
|
||||
"title": "Proposal:Access: name space",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Access:_name_space",
|
||||
"last_modified": "30 April 2023",
|
||||
"proposer": "Hawke",
|
||||
"section_count": 0,
|
||||
"link_count": 11,
|
||||
"word_count": 109,
|
||||
"votes": {
|
||||
"approve": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"oppose": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
},
|
||||
"abstain": {
|
||||
"count": 0,
|
||||
"users": []
|
||||
}
|
||||
},
|
||||
"total_votes": 0,
|
||||
"approve_percentage": 0,
|
||||
"oppose_percentage": 0,
|
||||
"abstain_percentage": 0
|
||||
}
|
||||
],
|
||||
"statistics": {
|
||||
"total_proposals": 2,
|
||||
"total_votes": 0,
|
||||
"avg_votes_per_proposal": 0.0,
|
||||
"unique_voters": 0,
|
||||
"top_voters": []
|
||||
}
|
||||
}
|
625
wiki_compare/fetch_archived_proposals.py
Normal file
625
wiki_compare/fetch_archived_proposals.py
Normal file
|
@ -0,0 +1,625 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
fetch_archived_proposals.py
|
||||
|
||||
This script scrapes archived proposals from the OpenStreetMap wiki and extracts voting information.
|
||||
It analyzes the voting patterns, counts votes by type (approve, oppose, abstain), and collects
|
||||
information about the users who voted.
|
||||
|
||||
The script saves the data to a JSON file that can be used by the Symfony application.
|
||||
|
||||
Usage:
|
||||
python fetch_archived_proposals.py [--force] [--limit N]
|
||||
|
||||
Options:
|
||||
--force Force refresh of all proposals, even if they have already been processed
|
||||
--limit N Limit processing to N proposals (default: process all proposals)
|
||||
|
||||
Output:
|
||||
- archived_proposals.json file with voting information
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(levelname)s - %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Constants
|
||||
ARCHIVED_PROPOSALS_URL = "https://wiki.openstreetmap.org/wiki/Category:Archived_proposals"
|
||||
import os
|
||||
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
|
||||
ARCHIVED_PROPOSALS_FILE = os.path.join(SCRIPT_DIR, "archived_proposals.json")
|
||||
USER_AGENT = "OSM-Commerces/1.0 (https://github.com/yourusername/osm-commerces; your@email.com)"
|
||||
RATE_LIMIT_DELAY = 1 # seconds between requests to avoid rate limiting
|
||||
|
||||
# Vote patterns
|
||||
VOTE_PATTERNS = {
|
||||
'approve': [
|
||||
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:approve|support|agree\s+with)\s+this\s+proposal',
|
||||
r'I\s+vote\s+(?:to\s+)?(?:approve|support)',
|
||||
r'(?:Symbol\s+support\s+vote\.svg|Symbol_support_vote\.svg)',
|
||||
],
|
||||
'oppose': [
|
||||
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:oppose|disagree\s+with|reject|do\s+not\s+support)\s+this\s+proposal',
|
||||
r'I\s+vote\s+(?:to\s+)?(?:oppose|reject|against)',
|
||||
r'(?:Symbol\s+oppose\s+vote\.svg|Symbol_oppose_vote\.svg)',
|
||||
],
|
||||
'abstain': [
|
||||
r'I\s+(?:have\s+comments\s+but\s+)?abstain\s+from\s+voting',
|
||||
r'I\s+(?:have\s+comments\s+but\s+)?(?:neither\s+approve\s+nor\s+oppose|am\s+neutral)',
|
||||
r'(?:Symbol\s+abstain\s+vote\.svg|Symbol_abstain_vote\.svg)',
|
||||
]
|
||||
}
|
||||
|
||||
def parse_arguments():
|
||||
"""Parse command line arguments"""
|
||||
parser = argparse.ArgumentParser(description='Fetch and analyze archived OSM proposals')
|
||||
parser.add_argument('--force', action='store_true', help='Force refresh of all proposals')
|
||||
parser.add_argument('--limit', type=int, help='Limit processing to N proposals (default: process all)')
|
||||
return parser.parse_args()
|
||||
|
||||
def load_existing_data():
|
||||
"""Load existing archived proposals data if available"""
|
||||
if os.path.exists(ARCHIVED_PROPOSALS_FILE):
|
||||
try:
|
||||
with open(ARCHIVED_PROPOSALS_FILE, 'r', encoding='utf-8') as f:
|
||||
data = json.load(f)
|
||||
logger.info(f"Loaded {len(data.get('proposals', []))} existing proposals from {ARCHIVED_PROPOSALS_FILE}")
|
||||
return data
|
||||
except (json.JSONDecodeError, IOError) as e:
|
||||
logger.error(f"Error loading existing data: {e}")
|
||||
|
||||
# Return empty structure if file doesn't exist or has errors
|
||||
return {
|
||||
'last_updated': None,
|
||||
'proposals': []
|
||||
}
|
||||
|
||||
def save_data(data):
|
||||
"""Save data to JSON file"""
|
||||
try:
|
||||
# Update last_updated timestamp
|
||||
data['last_updated'] = datetime.now().isoformat()
|
||||
|
||||
with open(ARCHIVED_PROPOSALS_FILE, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
|
||||
logger.info(f"Saved {len(data.get('proposals', []))} proposals to {ARCHIVED_PROPOSALS_FILE}")
|
||||
except IOError as e:
|
||||
logger.error(f"Error saving data: {e}")
|
||||
except Exception as e:
|
||||
logger.error(f"Unexpected error saving data: {e}")
|
||||
|
||||
def fetch_page(url):
|
||||
"""Fetch a page from the OSM wiki"""
|
||||
headers = {
|
||||
'User-Agent': USER_AGENT
|
||||
}
|
||||
|
||||
try:
|
||||
response = requests.get(url, headers=headers)
|
||||
response.raise_for_status()
|
||||
return response.text
|
||||
except requests.exceptions.RequestException as e:
|
||||
logger.error(f"Error fetching {url}: {e}")
|
||||
return None
|
||||
|
||||
def get_proposal_urls():
|
||||
"""Get URLs of all archived proposals"""
|
||||
logger.info(f"Fetching archived proposals list from {ARCHIVED_PROPOSALS_URL}")
|
||||
|
||||
html = fetch_page(ARCHIVED_PROPOSALS_URL)
|
||||
if not html:
|
||||
return []
|
||||
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
|
||||
# Find all links in the category pages
|
||||
proposal_urls = []
|
||||
|
||||
# Get proposals from the main category page
|
||||
category_content = soup.select_one('#mw-pages')
|
||||
if category_content:
|
||||
for link in category_content.select('a'):
|
||||
if link.get('title') and 'Category:' not in link.get('title'):
|
||||
proposal_urls.append({
|
||||
'title': link.get('title'),
|
||||
'url': urljoin(ARCHIVED_PROPOSALS_URL, link.get('href'))
|
||||
})
|
||||
|
||||
# Check if there are subcategories
|
||||
subcategories = soup.select('#mw-subcategories a')
|
||||
for subcat in subcategories:
|
||||
if 'Category:' in subcat.get('title', ''):
|
||||
logger.info(f"Found subcategory: {subcat.get('title')}")
|
||||
subcat_url = urljoin(ARCHIVED_PROPOSALS_URL, subcat.get('href'))
|
||||
|
||||
# Fetch the subcategory page
|
||||
time.sleep(RATE_LIMIT_DELAY) # Respect rate limits
|
||||
subcat_html = fetch_page(subcat_url)
|
||||
if subcat_html:
|
||||
subcat_soup = BeautifulSoup(subcat_html, 'html.parser')
|
||||
subcat_content = subcat_soup.select_one('#mw-pages')
|
||||
if subcat_content:
|
||||
for link in subcat_content.select('a'):
|
||||
if link.get('title') and 'Category:' not in link.get('title'):
|
||||
proposal_urls.append({
|
||||
'title': link.get('title'),
|
||||
'url': urljoin(ARCHIVED_PROPOSALS_URL, link.get('href'))
|
||||
})
|
||||
|
||||
logger.info(f"Found {len(proposal_urls)} archived proposals")
|
||||
return proposal_urls
|
||||
|
||||
def extract_username(text):
|
||||
"""Extract username from a signature line"""
|
||||
# Common patterns for signatures
|
||||
patterns = [
|
||||
r'--\s*\[\[User:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User:Username|Username]]
|
||||
r'--\s*\[\[User:([^|\]]+)\]\]', # --[[User:Username]]
|
||||
r'--\s*\[\[User talk:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User talk:Username|Username]]
|
||||
r'--\s*\[\[User talk:([^|\]]+)\]\]', # --[[User talk:Username]]
|
||||
r'--\s*\[\[Special:Contributions/([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[Special:Contributions/Username|Username]]
|
||||
r'--\s*\[\[Special:Contributions/([^|\]]+)\]\]', # --[[Special:Contributions/Username]]
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, text)
|
||||
if match:
|
||||
return match.group(1).strip()
|
||||
|
||||
# If no match found with the patterns, try to find any username-like string
|
||||
match = re.search(r'--\s*([A-Za-z0-9_-]+)', text)
|
||||
if match:
|
||||
return match.group(1).strip()
|
||||
|
||||
return None
|
||||
|
||||
def extract_date(text):
|
||||
"""Extract date from a signature line"""
|
||||
# Look for common date formats in signatures
|
||||
date_patterns = [
|
||||
r'(\d{1,2}:\d{2}, \d{1,2} [A-Za-z]+ \d{4})', # 15:30, 25 December 2023
|
||||
r'(\d{1,2} [A-Za-z]+ \d{4} \d{1,2}:\d{2})', # 25 December 2023 15:30
|
||||
r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})', # 2023-12-25T15:30:00
|
||||
]
|
||||
|
||||
for pattern in date_patterns:
|
||||
match = re.search(pattern, text)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
return None
|
||||
|
||||
def determine_vote_type(text):
|
||||
"""Determine the type of vote from the text"""
|
||||
text_lower = text.lower()
|
||||
|
||||
for vote_type, patterns in VOTE_PATTERNS.items():
|
||||
for pattern in patterns:
|
||||
if re.search(pattern, text_lower, re.IGNORECASE):
|
||||
return vote_type
|
||||
|
||||
return None
|
||||
|
||||
def extract_votes(html):
|
||||
"""Extract voting information from proposal HTML"""
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
|
||||
# Find the voting section
|
||||
voting_section = None
|
||||
for heading in soup.find_all(['h2', 'h3']):
|
||||
heading_text = heading.get_text().lower()
|
||||
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
|
||||
voting_section = heading
|
||||
break
|
||||
|
||||
if not voting_section:
|
||||
logger.warning("No voting section found")
|
||||
return {
|
||||
'approve': {'count': 0, 'users': []},
|
||||
'oppose': {'count': 0, 'users': []},
|
||||
'abstain': {'count': 0, 'users': []}
|
||||
}
|
||||
|
||||
# Get the content after the voting section heading
|
||||
votes_content = []
|
||||
current = voting_section.next_sibling
|
||||
|
||||
# Collect all elements until the next heading or the end of the document
|
||||
while current and not current.name in ['h2', 'h3']:
|
||||
if current.name: # Skip NavigableString objects
|
||||
votes_content.append(current)
|
||||
current = current.next_sibling
|
||||
|
||||
# Process vote lists
|
||||
votes = {
|
||||
'approve': {'count': 0, 'users': []},
|
||||
'oppose': {'count': 0, 'users': []},
|
||||
'abstain': {'count': 0, 'users': []}
|
||||
}
|
||||
|
||||
# For tracking vote dates to calculate duration
|
||||
all_vote_dates = []
|
||||
|
||||
# Look for lists of votes
|
||||
for element in votes_content:
|
||||
if element.name == 'ul':
|
||||
for li in element.find_all('li'):
|
||||
vote_text = li.get_text()
|
||||
vote_type = determine_vote_type(vote_text)
|
||||
|
||||
if vote_type:
|
||||
username = extract_username(vote_text)
|
||||
date = extract_date(vote_text)
|
||||
|
||||
# Extract comment by removing vote declaration and signature
|
||||
comment = vote_text
|
||||
|
||||
# Remove vote declaration patterns
|
||||
for pattern in VOTE_PATTERNS[vote_type]:
|
||||
comment = re.sub(pattern, '', comment, flags=re.IGNORECASE)
|
||||
|
||||
# Remove signature
|
||||
signature_patterns = [
|
||||
r'--\s*\[\[User:[^]]+\]\].*$',
|
||||
r'--\s*\[\[User talk:[^]]+\]\].*$',
|
||||
r'--\s*\[\[Special:Contributions/[^]]+\]\].*$',
|
||||
r'--\s*[A-Za-z0-9_-]+.*$'
|
||||
]
|
||||
for pattern in signature_patterns:
|
||||
comment = re.sub(pattern, '', comment, flags=re.IGNORECASE)
|
||||
|
||||
# Clean up the comment
|
||||
comment = comment.strip()
|
||||
|
||||
if username:
|
||||
votes[vote_type]['count'] += 1
|
||||
votes[vote_type]['users'].append({
|
||||
'username': username,
|
||||
'date': date,
|
||||
'comment': comment
|
||||
})
|
||||
|
||||
# Add date to list for duration calculation if it's valid
|
||||
if date:
|
||||
try:
|
||||
# Try to parse the date in different formats
|
||||
parsed_date = None
|
||||
for date_format in [
|
||||
'%H:%M, %d %B %Y', # 15:30, 25 December 2023
|
||||
'%d %B %Y %H:%M', # 25 December 2023 15:30
|
||||
'%Y-%m-%dT%H:%M:%S' # 2023-12-25T15:30:00
|
||||
]:
|
||||
try:
|
||||
parsed_date = datetime.strptime(date, date_format)
|
||||
break
|
||||
except ValueError:
|
||||
continue
|
||||
|
||||
if parsed_date:
|
||||
all_vote_dates.append(parsed_date)
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not parse date '{date}': {e}")
|
||||
|
||||
# Calculate vote duration if we have at least two dates
|
||||
if len(all_vote_dates) >= 2:
|
||||
all_vote_dates.sort()
|
||||
first_vote = all_vote_dates[0]
|
||||
last_vote = all_vote_dates[-1]
|
||||
vote_duration_days = (last_vote - first_vote).days
|
||||
votes['first_vote'] = first_vote.strftime('%Y-%m-%d')
|
||||
votes['last_vote'] = last_vote.strftime('%Y-%m-%d')
|
||||
votes['duration_days'] = vote_duration_days
|
||||
|
||||
return votes
|
||||
|
||||
def extract_proposal_metadata(html, url):
|
||||
"""Extract metadata about the proposal"""
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
|
||||
# Get title
|
||||
title_element = soup.select_one('#firstHeading')
|
||||
title = title_element.get_text() if title_element else "Unknown Title"
|
||||
|
||||
# Get last modified date
|
||||
last_modified = None
|
||||
footer_info = soup.select_one('#footer-info-lastmod')
|
||||
if footer_info:
|
||||
last_modified_text = footer_info.get_text()
|
||||
match = re.search(r'(\d{1,2} [A-Za-z]+ \d{4})', last_modified_text)
|
||||
if match:
|
||||
last_modified = match.group(1)
|
||||
|
||||
# Get content element for further processing
|
||||
content = soup.select_one('#mw-content-text')
|
||||
|
||||
# Get proposer from the page
|
||||
proposer = None
|
||||
|
||||
# Get proposal status from the page
|
||||
status = None
|
||||
|
||||
# Look for table rows to find proposer and status
|
||||
if content:
|
||||
# Look for table rows
|
||||
for row in content.select('tr'):
|
||||
# Check if the row has at least two cells (th and td)
|
||||
cells = row.select('th, td')
|
||||
if len(cells) >= 2:
|
||||
# Get the header text from the first cell
|
||||
header_text = cells[0].get_text().strip().lower()
|
||||
|
||||
# Check for "Proposed by:" to find proposer
|
||||
if "proposed by" in header_text:
|
||||
# Look for user link in the next cell
|
||||
user_link = cells[1].select_one('a[href*="/wiki/User:"]')
|
||||
if user_link:
|
||||
# Extract username from the link
|
||||
href = user_link.get('href', '')
|
||||
title = user_link.get('title', '')
|
||||
|
||||
# Try to get username from title attribute first
|
||||
if title and title.startswith('User:'):
|
||||
proposer = title[5:] # Remove 'User:' prefix
|
||||
# Otherwise try to extract from href
|
||||
elif href:
|
||||
href_match = re.search(r'/wiki/User:([^/]+)', href)
|
||||
if href_match:
|
||||
proposer = href_match.group(1)
|
||||
|
||||
# If still no proposer, use the link text
|
||||
if not proposer and user_link.get_text():
|
||||
proposer = user_link.get_text().strip()
|
||||
|
||||
logger.info(f"Found proposer in table: {proposer}")
|
||||
|
||||
# Check for "Proposal status:" to find status
|
||||
elif "proposal status" in header_text:
|
||||
# Get the status from the next cell
|
||||
status_cell = cells[1]
|
||||
|
||||
# First try to find a link with a category title containing status
|
||||
status_link = status_cell.select_one('a[title*="Category:Proposals with"]')
|
||||
if status_link:
|
||||
# Extract status from the title attribute
|
||||
status_match = re.search(r'Category:Proposals with "([^"]+)" status', status_link.get('title', ''))
|
||||
if status_match:
|
||||
status = status_match.group(1)
|
||||
logger.info(f"Found status in table link: {status}")
|
||||
|
||||
# If no status found in link, try to get text content
|
||||
if not status:
|
||||
status_text = status_cell.get_text().strip()
|
||||
# Try to match one of the known statuses
|
||||
known_statuses = [
|
||||
"Draft", "Proposed", "Voting", "Post-vote", "Approved",
|
||||
"Rejected", "Abandoned", "Canceled", "Obsoleted",
|
||||
"Inactive", "Undefined"
|
||||
]
|
||||
for known_status in known_statuses:
|
||||
if known_status.lower() in status_text.lower():
|
||||
status = known_status
|
||||
logger.info(f"Found status in table text: {status}")
|
||||
break
|
||||
|
||||
# If no proposer found in table, try the first paragraph method
|
||||
if not proposer:
|
||||
first_paragraph = soup.select_one('#mw-content-text p')
|
||||
if first_paragraph:
|
||||
proposer_match = re.search(r'(?:proposed|created|authored)\s+by\s+\[\[User:([^|\]]+)', first_paragraph.get_text())
|
||||
if proposer_match:
|
||||
proposer = proposer_match.group(1)
|
||||
logger.info(f"Found proposer in paragraph: {proposer}")
|
||||
|
||||
# Count sections, links, and words
|
||||
section_count = len(soup.select('#mw-content-text h2, #mw-content-text h3, #mw-content-text h4')) if content else 0
|
||||
|
||||
# Count links excluding user/talk pages (voting signatures)
|
||||
links = []
|
||||
if content:
|
||||
for link in content.select('a'):
|
||||
href = link.get('href', '')
|
||||
if href and not re.search(r'User:|User_talk:|Special:Contributions', href):
|
||||
links.append(href)
|
||||
link_count = len(links)
|
||||
|
||||
# Approximate word count
|
||||
word_count = 0
|
||||
if content:
|
||||
# Get text content excluding navigation elements
|
||||
for nav in content.select('.navbox, .ambox, .tmbox, .mw-editsection'):
|
||||
nav.decompose()
|
||||
|
||||
# Also exclude the voting section to count only the proposal content
|
||||
voting_section = None
|
||||
for heading in content.find_all(['h2', 'h3']):
|
||||
heading_text = heading.get_text().lower()
|
||||
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
|
||||
voting_section = heading
|
||||
break
|
||||
|
||||
if voting_section:
|
||||
# Remove the voting section and everything after it
|
||||
current = voting_section
|
||||
while current:
|
||||
next_sibling = current.next_sibling
|
||||
current.decompose()
|
||||
current = next_sibling
|
||||
|
||||
# Count words in the remaining content
|
||||
text = content.get_text()
|
||||
word_count = len(re.findall(r'\b\w+\b', text))
|
||||
|
||||
return {
|
||||
'title': title,
|
||||
'url': url,
|
||||
'last_modified': last_modified,
|
||||
'proposer': proposer,
|
||||
'status': status,
|
||||
'section_count': section_count,
|
||||
'link_count': link_count,
|
||||
'word_count': word_count
|
||||
}
|
||||
|
||||
def process_proposal(proposal, force=False):
|
||||
"""Process a single proposal and extract voting information"""
|
||||
url = proposal['url']
|
||||
title = proposal['title']
|
||||
|
||||
logger.info(f"Processing proposal: {title}")
|
||||
|
||||
# Fetch the proposal page
|
||||
html = fetch_page(url)
|
||||
if not html:
|
||||
return None
|
||||
|
||||
# Extract metadata
|
||||
metadata = extract_proposal_metadata(html, url)
|
||||
|
||||
# Extract votes
|
||||
votes = extract_votes(html)
|
||||
|
||||
# Combine metadata and votes
|
||||
result = {**metadata, 'votes': votes}
|
||||
|
||||
# Calculate total votes and percentages
|
||||
total_votes = votes['approve']['count'] + votes['oppose']['count'] + votes['abstain']['count']
|
||||
|
||||
if total_votes > 0:
|
||||
result['total_votes'] = total_votes
|
||||
result['approve_percentage'] = round((votes['approve']['count'] / total_votes) * 100, 1)
|
||||
result['oppose_percentage'] = round((votes['oppose']['count'] / total_votes) * 100, 1)
|
||||
result['abstain_percentage'] = round((votes['abstain']['count'] / total_votes) * 100, 1)
|
||||
else:
|
||||
result['total_votes'] = 0
|
||||
result['approve_percentage'] = 0
|
||||
result['oppose_percentage'] = 0
|
||||
result['abstain_percentage'] = 0
|
||||
|
||||
return result
|
||||
|
||||
def main():
|
||||
"""Main function to execute the script"""
|
||||
args = parse_arguments()
|
||||
force = args.force
|
||||
limit = args.limit
|
||||
|
||||
logger.info("Starting fetch_archived_proposals.py")
|
||||
if limit:
|
||||
logger.info(f"Processing limited to {limit} proposals")
|
||||
|
||||
# Load existing data
|
||||
data = load_existing_data()
|
||||
|
||||
# Get list of proposal URLs
|
||||
proposal_urls = get_proposal_urls()
|
||||
|
||||
# Apply limit if specified
|
||||
if limit and limit < len(proposal_urls):
|
||||
logger.info(f"Limiting processing from {len(proposal_urls)} to {limit} proposals")
|
||||
proposal_urls = proposal_urls[:limit]
|
||||
|
||||
# Create a map of existing proposals by URL for quick lookup
|
||||
existing_proposals = {p['url']: p for p in data.get('proposals', [])}
|
||||
|
||||
# Process each proposal
|
||||
new_proposals = []
|
||||
processed_count = 0
|
||||
for proposal in proposal_urls:
|
||||
url = proposal['url']
|
||||
original_title = proposal['title']
|
||||
|
||||
# Skip if already processed and not forcing refresh
|
||||
if url in existing_proposals and not force:
|
||||
logger.info(f"Skipping already processed proposal: {original_title}")
|
||||
new_proposals.append(existing_proposals[url])
|
||||
continue
|
||||
|
||||
# Process the proposal
|
||||
time.sleep(RATE_LIMIT_DELAY) # Respect rate limits
|
||||
processed = process_proposal(proposal, force)
|
||||
|
||||
if processed:
|
||||
# Ensure the title is preserved from the original proposal
|
||||
if processed.get('title') != original_title:
|
||||
logger.warning(f"Title changed during processing from '{original_title}' to '{processed.get('title')}'. Restoring original title.")
|
||||
processed['title'] = original_title
|
||||
|
||||
new_proposals.append(processed)
|
||||
processed_count += 1
|
||||
|
||||
# Check if we've reached the limit
|
||||
if limit and processed_count >= limit:
|
||||
logger.info(f"Reached limit of {limit} processed proposals")
|
||||
break
|
||||
|
||||
# Update the data
|
||||
data['proposals'] = new_proposals
|
||||
|
||||
# Calculate global statistics
|
||||
total_proposals = len(new_proposals)
|
||||
total_votes = sum(p.get('total_votes', 0) for p in new_proposals)
|
||||
avg_votes_per_proposal = round(total_votes / total_proposals, 1) if total_proposals > 0 else 0
|
||||
|
||||
# Count unique voters
|
||||
all_voters = set()
|
||||
for p in new_proposals:
|
||||
for vote_type in ['approve', 'oppose', 'abstain']:
|
||||
for user in p.get('votes', {}).get(vote_type, {}).get('users', []):
|
||||
if 'username' in user:
|
||||
all_voters.add(user['username'])
|
||||
|
||||
# Find most active voters
|
||||
voter_counts = {}
|
||||
for p in new_proposals:
|
||||
for vote_type in ['approve', 'oppose', 'abstain']:
|
||||
for user in p.get('votes', {}).get(vote_type, {}).get('users', []):
|
||||
if 'username' in user:
|
||||
username = user['username']
|
||||
if username not in voter_counts:
|
||||
voter_counts[username] = {'total': 0, 'approve': 0, 'oppose': 0, 'abstain': 0}
|
||||
voter_counts[username]['total'] += 1
|
||||
voter_counts[username][vote_type] += 1
|
||||
|
||||
# Sort voters by total votes
|
||||
top_voters = sorted(
|
||||
[{'username': k, **v} for k, v in voter_counts.items()],
|
||||
key=lambda x: x['total'],
|
||||
reverse=True
|
||||
)[:100] # Top 20 voters
|
||||
|
||||
# Add statistics to the data
|
||||
data['statistics'] = {
|
||||
'total_proposals': total_proposals,
|
||||
'total_votes': total_votes,
|
||||
'avg_votes_per_proposal': avg_votes_per_proposal,
|
||||
'unique_voters': len(all_voters),
|
||||
'top_voters': top_voters
|
||||
}
|
||||
|
||||
# Save the data
|
||||
save_data(data)
|
||||
|
||||
logger.info("Script completed successfully")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -7,6 +7,8 @@ import json
|
|||
import logging
|
||||
import argparse
|
||||
import os
|
||||
import re
|
||||
import time
|
||||
from datetime import datetime, timedelta
|
||||
|
||||
# Configure logging
|
||||
|
@ -26,6 +28,25 @@ OUTPUT_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'proposal
|
|||
# Cache timeout (in hours)
|
||||
CACHE_TIMEOUT = 1
|
||||
|
||||
# Vote patterns (same as in fetch_archived_proposals.py)
|
||||
VOTE_PATTERNS = {
|
||||
'approve': [
|
||||
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:approve|support|agree\s+with)\s+this\s+proposal',
|
||||
r'I\s+vote\s+(?:to\s+)?(?:approve|support)',
|
||||
r'(?:Symbol\s+support\s+vote\.svg|Symbol_support_vote\.svg)',
|
||||
],
|
||||
'oppose': [
|
||||
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:oppose|disagree\s+with|reject|do\s+not\s+support)\s+this\s+proposal',
|
||||
r'I\s+vote\s+(?:to\s+)?(?:oppose|reject|against)',
|
||||
r'(?:Symbol\s+oppose\s+vote\.svg|Symbol_oppose_vote\.svg)',
|
||||
],
|
||||
'abstain': [
|
||||
r'I\s+(?:have\s+comments\s+but\s+)?abstain\s+from\s+voting',
|
||||
r'I\s+(?:have\s+comments\s+but\s+)?(?:neither\s+approve\s+nor\s+oppose|am\s+neutral)',
|
||||
r'(?:Symbol\s+abstain\s+vote\.svg|Symbol_abstain_vote\.svg)',
|
||||
]
|
||||
}
|
||||
|
||||
def should_update_cache():
|
||||
"""
|
||||
Check if the cache file exists and if it's older than the cache timeout
|
||||
|
@ -46,6 +67,134 @@ def should_update_cache():
|
|||
logger.info(f"Cache is still fresh (less than {CACHE_TIMEOUT} hour(s) old)")
|
||||
return False
|
||||
|
||||
def fetch_page(url):
|
||||
"""
|
||||
Fetch a page from the OSM wiki
|
||||
"""
|
||||
try:
|
||||
response = requests.get(url)
|
||||
response.raise_for_status()
|
||||
return response.text
|
||||
except requests.exceptions.RequestException as e:
|
||||
logger.error(f"Error fetching {url}: {e}")
|
||||
return None
|
||||
|
||||
def extract_username(text):
|
||||
"""
|
||||
Extract username from a signature line
|
||||
"""
|
||||
# Common patterns for signatures
|
||||
patterns = [
|
||||
r'--\s*\[\[User:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User:Username|Username]]
|
||||
r'--\s*\[\[User:([^|\]]+)\]\]', # --[[User:Username]]
|
||||
r'--\s*\[\[User talk:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User talk:Username|Username]]
|
||||
r'--\s*\[\[User talk:([^|\]]+)\]\]', # --[[User talk:Username]]
|
||||
r'--\s*\[\[Special:Contributions/([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[Special:Contributions/Username|Username]]
|
||||
r'--\s*\[\[Special:Contributions/([^|\]]+)\]\]', # --[[Special:Contributions/Username]]
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, text)
|
||||
if match:
|
||||
return match.group(1).strip()
|
||||
|
||||
# If no match found with the patterns, try to find any username-like string
|
||||
match = re.search(r'--\s*([A-Za-z0-9_-]+)', text)
|
||||
if match:
|
||||
return match.group(1).strip()
|
||||
|
||||
return None
|
||||
|
||||
def extract_date(text):
|
||||
"""
|
||||
Extract date from a signature line
|
||||
"""
|
||||
# Look for common date formats in signatures
|
||||
date_patterns = [
|
||||
r'(\d{1,2}:\d{2}, \d{1,2} [A-Za-z]+ \d{4})', # 15:30, 25 December 2023
|
||||
r'(\d{1,2} [A-Za-z]+ \d{4} \d{1,2}:\d{2})', # 25 December 2023 15:30
|
||||
r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})', # 2023-12-25T15:30:00
|
||||
]
|
||||
|
||||
for pattern in date_patterns:
|
||||
match = re.search(pattern, text)
|
||||
if match:
|
||||
return match.group(1)
|
||||
|
||||
return None
|
||||
|
||||
def determine_vote_type(text):
|
||||
"""
|
||||
Determine the type of vote from the text
|
||||
"""
|
||||
text_lower = text.lower()
|
||||
|
||||
for vote_type, patterns in VOTE_PATTERNS.items():
|
||||
for pattern in patterns:
|
||||
if re.search(pattern, text_lower, re.IGNORECASE):
|
||||
return vote_type
|
||||
|
||||
return None
|
||||
|
||||
def extract_votes(html):
|
||||
"""
|
||||
Extract voting information from proposal HTML
|
||||
"""
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
|
||||
# Find the voting section
|
||||
voting_section = None
|
||||
for heading in soup.find_all(['h2', 'h3']):
|
||||
heading_text = heading.get_text().lower()
|
||||
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
|
||||
voting_section = heading
|
||||
break
|
||||
|
||||
if not voting_section:
|
||||
logger.warning("No voting section found")
|
||||
return {
|
||||
'approve': {'count': 0, 'users': []},
|
||||
'oppose': {'count': 0, 'users': []},
|
||||
'abstain': {'count': 0, 'users': []}
|
||||
}
|
||||
|
||||
# Get the content after the voting section heading
|
||||
votes_content = []
|
||||
current = voting_section.next_sibling
|
||||
|
||||
# Collect all elements until the next heading or the end of the document
|
||||
while current and not current.name in ['h2', 'h3']:
|
||||
if current.name: # Skip NavigableString objects
|
||||
votes_content.append(current)
|
||||
current = current.next_sibling
|
||||
|
||||
# Process vote lists
|
||||
votes = {
|
||||
'approve': {'count': 0, 'users': []},
|
||||
'oppose': {'count': 0, 'users': []},
|
||||
'abstain': {'count': 0, 'users': []}
|
||||
}
|
||||
|
||||
# Look for lists of votes
|
||||
for element in votes_content:
|
||||
if element.name == 'ul':
|
||||
for li in element.find_all('li'):
|
||||
vote_text = li.get_text()
|
||||
vote_type = determine_vote_type(vote_text)
|
||||
|
||||
if vote_type:
|
||||
username = extract_username(vote_text)
|
||||
date = extract_date(vote_text)
|
||||
|
||||
if username:
|
||||
votes[vote_type]['count'] += 1
|
||||
votes[vote_type]['users'].append({
|
||||
'username': username,
|
||||
'date': date
|
||||
})
|
||||
|
||||
return votes
|
||||
|
||||
def fetch_voting_proposals():
|
||||
"""
|
||||
Fetch proposals with "Voting" status from the OSM Wiki
|
||||
|
@ -69,12 +218,72 @@ def fetch_voting_proposals():
|
|||
proposal_title = link.text.strip()
|
||||
proposal_url = 'https://wiki.openstreetmap.org' + link.get('href', '')
|
||||
|
||||
proposals.append({
|
||||
# Create a basic proposal object
|
||||
proposal = {
|
||||
'title': proposal_title,
|
||||
'url': proposal_url,
|
||||
'status': 'Voting',
|
||||
'type': 'voting'
|
||||
})
|
||||
}
|
||||
|
||||
# Fetch the proposal page to extract voting information
|
||||
logger.info(f"Fetching proposal page: {proposal_title}")
|
||||
html = fetch_page(proposal_url)
|
||||
|
||||
if html:
|
||||
# Extract voting information
|
||||
votes = extract_votes(html)
|
||||
|
||||
# Add voting information to the proposal
|
||||
proposal['votes'] = votes
|
||||
|
||||
# Calculate total votes and percentages
|
||||
total_votes = votes['approve']['count'] + votes['oppose']['count'] + votes['abstain']['count']
|
||||
|
||||
if total_votes > 0:
|
||||
proposal['total_votes'] = total_votes
|
||||
proposal['approve_percentage'] = round((votes['approve']['count'] / total_votes) * 100, 1)
|
||||
proposal['oppose_percentage'] = round((votes['oppose']['count'] / total_votes) * 100, 1)
|
||||
proposal['abstain_percentage'] = round((votes['abstain']['count'] / total_votes) * 100, 1)
|
||||
else:
|
||||
proposal['total_votes'] = 0
|
||||
proposal['approve_percentage'] = 0
|
||||
proposal['oppose_percentage'] = 0
|
||||
proposal['abstain_percentage'] = 0
|
||||
|
||||
# Extract proposer from the page
|
||||
soup = BeautifulSoup(html, 'html.parser')
|
||||
content = soup.select_one('#mw-content-text')
|
||||
|
||||
if content:
|
||||
# Look for table rows with "Proposed by:" in the header cell
|
||||
for row in content.select('tr'):
|
||||
cells = row.select('th, td')
|
||||
if len(cells) >= 2:
|
||||
header_text = cells[0].get_text().strip().lower()
|
||||
if "proposed by" in header_text:
|
||||
user_link = cells[1].select_one('a[href*="/wiki/User:"]')
|
||||
if user_link:
|
||||
href = user_link.get('href', '')
|
||||
title = user_link.get('title', '')
|
||||
|
||||
# Try to get username from title attribute first
|
||||
if title and title.startswith('User:'):
|
||||
proposal['proposer'] = title[5:] # Remove 'User:' prefix
|
||||
# Otherwise try to extract from href
|
||||
elif href:
|
||||
href_match = re.search(r'/wiki/User:([^/]+)', href)
|
||||
if href_match:
|
||||
proposal['proposer'] = href_match.group(1)
|
||||
|
||||
# If still no proposer, use the link text
|
||||
if 'proposer' not in proposal and user_link.get_text():
|
||||
proposal['proposer'] = user_link.get_text().strip()
|
||||
|
||||
# Add a delay to avoid overloading the server
|
||||
time.sleep(1)
|
||||
|
||||
proposals.append(proposal)
|
||||
|
||||
logger.info(f"Found {len(proposals)} voting proposals")
|
||||
return proposals
|
||||
|
|
|
@ -2389,7 +2389,596 @@
|
|||
],
|
||||
"common": []
|
||||
},
|
||||
"proposed_translation": " Voici la traduction du texte anglais vers le français :\n\nType Description\nType de relation. Groupe : propriétés Utilisés sur ces éléments\nValeurs documentées : 20\nVoir aussi * :type =*\nStatut : de facto type = *\nPlus de détails à la page info des balises\nTools pour cette balise taginfo · AD · AT · BR · BY · CH · CN · CZ · DE · DK · FI · FR · GB · GR · HU · IN · IR · IT · LI · LU · JP · KP · KR · NL · PL · PT · RU · ES · AR · MX · CO · BO · CL · EC · PY · PE · UY · VE · TW · UA · US · VN\noverpass-turbo OSM Tag History\ntype =* sur un objet relation spécifie son type et les interactions entre ses membres. Les types de relations établis et proposés sont listés ci-dessous. type a été également occasionnellement utilisé comme balise supplémentaire pour spécifier une \"variante\" d'une catégorie de fonctionnalité sur les voies et points. Cette approche est en conflit avec son utilisation dans les relations et devrait être évitée . De plus, pour les éléments possédant plusieurs balises, il n'est pas clair à quelle balise le type correspond. Au lieu de cela, utiliser une approche basée sur un suffixe ou une sous-balise, comme décrit dans *:type =* .\nContenu\n1 Relations établies\n2 Relations peu utilisées\n3 Utilisations proposées\n3.1 Junctions, intersections, grade separated crossings, and embankments\n3.2 Area hierarchies et autres relations pour les zones\n3.3 Adressage\n3.4 Autres\n4 Possibles erreurs de balises\n5 Qualité Assurance\n6 Voir aussi\nRelations établies\nType Statut Membres Commentaires Statistics Image\nmultipolygon de facto ( )\nZones où l'enveloppe se compose de plusieurs voies, ou qui possèdent des trous.\nroute de facto\nUne route établie (généralement signalée) sur une voie\nvaleurs documentées : 10\nvoir aussi *:route =*\nturn_restriction =*\nOpposition à la balise\n3.5 Système de signalisation\n\nNota bene : Il y a plusieurs erreurs dans le texte original anglais, ainsi que des erreurs de traduction dans la version française. C'est pourquoi j'ai fourni une traduction corrigée ici."
|
||||
"proposed_translation": " Voici la traduction du texte anglais vers le français :\n\nType Description\nType de relation. Groupe : propriétés Utilisés sur ces éléments\nValeurs documentées : 20\nVoir aussi * :type =*\nStatut : de facto type = *\nPlus de détails à la page info des balises\nTools pour cette balise taginfo · AD · AT · BR · BY · CH · CN · CZ · DE · DK · FI · FR · GB · GR · HU · IN · IR · IT · LI · LU · JP · KP · KR · NL · PL · PT · RU · ES · AR · MX · CO · BO · CL · EC · PY · PE · UY · VE · TW · UA · US · VN\noverpass-turbo OSM Tag History\ntype =* sur un objet relation spécifie son type et les interactions entre ses membres. Les types de relations établis et proposés sont listés ci-dessous. type a été également occasionnellement utilisé comme balise supplémentaire pour spécifier une \"variante\" d'une catégorie de fonctionnalité sur les voies et points. Cette approche est en conflit avec son utilisation dans les relations et devrait être évitée . De plus, pour les éléments possédant plusieurs balises, il n'est pas clair à quelle balise le type correspond. Au lieu de cela, utiliser une approche basée sur un suffixe ou une sous-balise, comme décrit dans *:type =* .\nContenu\n1 Relations établies\n2 Relations peu utilisées\n3 Utilisations proposées\n3.1 Junctions, intersections, grade separated crossings, and embankments\n3.2 Area hierarchies et autres relations pour les zones\n3.3 Adressage\n3.4 Autres\n4 Possibles erreurs de balises\n5 Qualité Assurance\n6 Voir aussi\nRelations établies\nType Statut Membres Commentaires Statistics Image\nmultipolygon de facto ( )\nZones où l'enveloppe se compose de plusieurs voies, ou qui possèdent des trous.\nroute de facto\nUne route établie (généralement signalée) sur une voie\nvaleurs documentées : 10\nvoir aussi *:route =*\nturn_restriction =*\nOpposition à la balise\n3.5 Système de signalisation\n\nNota bene : Il y a plusieurs erreurs dans le texte original anglais, ainsi que des erreurs de traduction dans la version française. C'est pourquoi j'ai fourni une traduction corrigée ici.",
|
||||
"grammar_suggestions": [
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2043,
|
||||
"end": 2045,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant ce signe.",
|
||||
"suggestions": [
|
||||
")"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2326,
|
||||
"end": 2328,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant ce signe.",
|
||||
"suggestions": [
|
||||
")"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 496,
|
||||
"end": 498,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant un point.",
|
||||
"suggestions": [
|
||||
"."
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1429,
|
||||
"end": 1431,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant un point.",
|
||||
"suggestions": [
|
||||
"."
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2673,
|
||||
"end": 2675,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant un point.",
|
||||
"suggestions": [
|
||||
"."
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1846,
|
||||
"end": 1848,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant une virgule.",
|
||||
"suggestions": [
|
||||
","
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2579,
|
||||
"end": 2581,
|
||||
"type": "typo",
|
||||
"message": "Pas d’espace avant une virgule.",
|
||||
"suggestions": [
|
||||
","
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2690,
|
||||
"end": 2693,
|
||||
"type": "typo",
|
||||
"message": "Guillemets isolés.",
|
||||
"suggestions": [
|
||||
" « ",
|
||||
" » ",
|
||||
" “",
|
||||
"” "
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2767,
|
||||
"end": 2769,
|
||||
"type": "typo",
|
||||
"message": "Guillemets fermants.",
|
||||
"suggestions": [
|
||||
" »",
|
||||
"”"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 410,
|
||||
"end": 412,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"L’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 482,
|
||||
"end": 484,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"d’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 532,
|
||||
"end": 534,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"s’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 555,
|
||||
"end": 557,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"l’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 579,
|
||||
"end": 581,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"d’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1712,
|
||||
"end": 1714,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"l’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1870,
|
||||
"end": 1872,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"d’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1992,
|
||||
"end": 1994,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"l’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2464,
|
||||
"end": 2466,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"d’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2509,
|
||||
"end": 2511,
|
||||
"type": "apos",
|
||||
"message": "Apostrophe typographique.",
|
||||
"suggestions": [
|
||||
"l’"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1723,
|
||||
"end": 1727,
|
||||
"type": "maj",
|
||||
"message": "Après un point, une majuscule est généralement requise.",
|
||||
"suggestions": [
|
||||
"Type"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1292,
|
||||
"end": 1304,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" multipolygon"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1334,
|
||||
"end": 1339,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" route"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1372,
|
||||
"end": 1380,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" building"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2484,
|
||||
"end": 2490,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" source"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2570,
|
||||
"end": 2579,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" Attributs"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2649,
|
||||
"end": 2657,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" Éléments"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2749,
|
||||
"end": 2753,
|
||||
"type": "typo",
|
||||
"message": "Il manque un espace.",
|
||||
"suggestions": [
|
||||
" type"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 51,
|
||||
"end": 52,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 96,
|
||||
"end": 98,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 617,
|
||||
"end": 619,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 644,
|
||||
"end": 646,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 678,
|
||||
"end": 680,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 698,
|
||||
"end": 700,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 780,
|
||||
"end": 782,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 817,
|
||||
"end": 819,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 844,
|
||||
"end": 846,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 859,
|
||||
"end": 861,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 942,
|
||||
"end": 944,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 962,
|
||||
"end": 964,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1275,
|
||||
"end": 1277,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1317,
|
||||
"end": 1319,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1355,
|
||||
"end": 1357,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1666,
|
||||
"end": 1668,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1731,
|
||||
"end": 1733,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 1946,
|
||||
"end": 1948,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2136,
|
||||
"end": 2138,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2158,
|
||||
"end": 2160,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2438,
|
||||
"end": 2440,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2498,
|
||||
"end": 2500,
|
||||
"type": "nbsp",
|
||||
"message": "Il manque un espace insécable.",
|
||||
"suggestions": [
|
||||
" :"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 2760,
|
||||
"end": 2767,
|
||||
"type": "num",
|
||||
"message": "Formatage des grands nombres.",
|
||||
"suggestions": [
|
||||
"2 060 502"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 64,
|
||||
"end": 71,
|
||||
"type": "gn",
|
||||
"message": "Accord de genre erroné avec « Propriétés ».",
|
||||
"suggestions": [
|
||||
"Utilisée"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 53,
|
||||
"end": 63,
|
||||
"type": "gn",
|
||||
"message": "Accord de nombre erroné avec « Utilisé ».",
|
||||
"suggestions": [
|
||||
"Propriété"
|
||||
],
|
||||
"context": ""
|
||||
},
|
||||
{
|
||||
"paragraph": 2,
|
||||
"start": 356,
|
||||
"end": 358,
|
||||
"type": "typo",
|
||||
"message": "Nombre ordinal romain singulier. Exemples : IIᵉ, IIIᵉ, IVᵉ…",
|
||||
"suggestions": [
|
||||
"Vᵉ"
|
||||
],
|
||||
"context": ""
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"key": "surface",
|
||||
|
|
|
@ -1,12 +1,6 @@
|
|||
{
|
||||
"last_updated": "2025-08-22T17:17:53.415750",
|
||||
"last_updated": "2025-08-31T11:26:56.762197",
|
||||
"voting_proposals": [
|
||||
{
|
||||
"title": "Proposal:Man made=ceremonial gate",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Man_made%3Dceremonial_gate",
|
||||
"status": "Voting",
|
||||
"type": "voting"
|
||||
},
|
||||
{
|
||||
"title": "Proposal:Developer",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Developer",
|
||||
|
@ -14,5 +8,13 @@
|
|||
"type": "voting"
|
||||
}
|
||||
],
|
||||
"recent_proposals": []
|
||||
"recent_proposals": [
|
||||
{
|
||||
"title": "Proposal:Pole types for public transportation",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Pole_types_for_public_transportation",
|
||||
"last_modified": "",
|
||||
"modified_by": "NFarras",
|
||||
"type": "recent"
|
||||
}
|
||||
]
|
||||
}
|
381
wiki_compare/suggest_grammar_improvements.py
Executable file
381
wiki_compare/suggest_grammar_improvements.py
Executable file
|
@ -0,0 +1,381 @@
|
|||
#!/usr/bin/env python3
|
||||
# -*- coding: utf-8 -*-
|
||||
|
||||
"""
|
||||
suggest_grammar_improvements.py
|
||||
|
||||
This script reads the outdated_pages.json file, selects a wiki page (by default the first one),
|
||||
and uses grammalecte to check the grammar of the French page content.
|
||||
The grammar suggestions are saved in the "grammar_suggestions" property of the JSON file.
|
||||
|
||||
The script is compatible with different versions of the grammalecte API:
|
||||
- For newer versions where GrammarChecker is directly in the grammalecte module
|
||||
- For older versions where GrammarChecker is in the grammalecte.fr module
|
||||
|
||||
Usage:
|
||||
python suggest_grammar_improvements.py [--page KEY]
|
||||
|
||||
Options:
|
||||
--page KEY Specify the key of the page to check (default: first page in the file)
|
||||
|
||||
Output:
|
||||
- Updated outdated_pages.json file with grammar suggestions
|
||||
"""
|
||||
|
||||
import json
|
||||
import argparse
|
||||
import logging
|
||||
import requests
|
||||
import os
|
||||
import sys
|
||||
import subprocess
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
try:
|
||||
import grammalecte
|
||||
import grammalecte.text as txt
|
||||
|
||||
# Check if GrammarChecker is available directly in the grammalecte module (newer versions)
|
||||
try:
|
||||
from grammalecte import GrammarChecker
|
||||
GRAMMALECTE_DIRECT_API = True
|
||||
except ImportError:
|
||||
# Try the older API structure with fr submodule
|
||||
try:
|
||||
import grammalecte.fr as gr_fr
|
||||
GRAMMALECTE_DIRECT_API = False
|
||||
except ImportError:
|
||||
# Neither API is available
|
||||
raise ImportError("Could not import GrammarChecker from grammalecte")
|
||||
|
||||
GRAMMALECTE_AVAILABLE = True
|
||||
except ImportError:
|
||||
GRAMMALECTE_AVAILABLE = False
|
||||
GRAMMALECTE_DIRECT_API = False
|
||||
|
||||
# Configure logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(levelname)s - %(message)s',
|
||||
datefmt='%Y-%m-%d %H:%M:%S'
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Constants
|
||||
OUTDATED_PAGES_FILE = "outdated_pages.json"
|
||||
|
||||
def load_outdated_pages():
|
||||
"""
|
||||
Load the outdated pages from the JSON file
|
||||
|
||||
Returns:
|
||||
list: List of dictionaries containing outdated page information
|
||||
"""
|
||||
try:
|
||||
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
|
||||
pages = json.load(f)
|
||||
logger.info(f"Successfully loaded {len(pages)} pages from {OUTDATED_PAGES_FILE}")
|
||||
return pages
|
||||
except (IOError, json.JSONDecodeError) as e:
|
||||
logger.error(f"Error loading pages from {OUTDATED_PAGES_FILE}: {e}")
|
||||
return []
|
||||
|
||||
def save_to_json(data, filename):
|
||||
"""
|
||||
Save data to a JSON file
|
||||
|
||||
Args:
|
||||
data: Data to save
|
||||
filename (str): Name of the file
|
||||
"""
|
||||
try:
|
||||
with open(filename, 'w', encoding='utf-8') as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
logger.info(f"Data saved to {filename}")
|
||||
except IOError as e:
|
||||
logger.error(f"Error saving data to {filename}: {e}")
|
||||
|
||||
def fetch_wiki_page_content(url):
|
||||
"""
|
||||
Fetch the content of a wiki page
|
||||
|
||||
Args:
|
||||
url (str): URL of the wiki page
|
||||
|
||||
Returns:
|
||||
str: Content of the wiki page
|
||||
"""
|
||||
try:
|
||||
response = requests.get(url)
|
||||
response.raise_for_status()
|
||||
|
||||
soup = BeautifulSoup(response.text, 'html.parser')
|
||||
|
||||
# Get the main content
|
||||
content = soup.select_one('#mw-content-text')
|
||||
if content:
|
||||
# Remove script and style elements
|
||||
for script in content.select('script, style'):
|
||||
script.extract()
|
||||
|
||||
# Remove .languages elements
|
||||
for languages_elem in content.select('.languages'):
|
||||
languages_elem.extract()
|
||||
|
||||
# Get text
|
||||
text = content.get_text(separator=' ', strip=True)
|
||||
return text
|
||||
else:
|
||||
logger.warning(f"Could not find content in page: {url}")
|
||||
return ""
|
||||
|
||||
except requests.exceptions.RequestException as e:
|
||||
logger.error(f"Error fetching wiki page content: {e}")
|
||||
return ""
|
||||
|
||||
def check_grammar_with_grammalecte(text):
|
||||
"""
|
||||
Check grammar using grammalecte
|
||||
|
||||
Args:
|
||||
text (str): Text to check
|
||||
|
||||
Returns:
|
||||
list: List of grammar suggestions
|
||||
"""
|
||||
if not GRAMMALECTE_AVAILABLE:
|
||||
logger.error("Grammalecte is not installed. Please install it with: pip install grammalecte")
|
||||
return []
|
||||
|
||||
try:
|
||||
logger.info("Checking grammar with grammalecte")
|
||||
|
||||
# Initialize grammalecte based on which API version is available
|
||||
if GRAMMALECTE_DIRECT_API:
|
||||
# New API: GrammarChecker is directly in grammalecte module
|
||||
logger.info("Using direct GrammarChecker API")
|
||||
gce = GrammarChecker("fr")
|
||||
|
||||
# Split text into paragraphs
|
||||
paragraphs = txt.getParagraph(text)
|
||||
|
||||
# Check grammar for each paragraph
|
||||
suggestions = []
|
||||
for i, paragraph in enumerate(paragraphs):
|
||||
if paragraph.strip():
|
||||
# Use getParagraphErrors method
|
||||
errors = gce.getParagraphErrors(paragraph)
|
||||
for error in errors:
|
||||
# Filter out spelling errors if needed
|
||||
if "sType" in error and error["sType"] != "WORD" and error.get("bError", True):
|
||||
suggestion = {
|
||||
"paragraph": i + 1,
|
||||
"start": error.get("nStart", 0),
|
||||
"end": error.get("nEnd", 0),
|
||||
"type": error.get("sType", ""),
|
||||
"message": error.get("sMessage", ""),
|
||||
"suggestions": error.get("aSuggestions", []),
|
||||
"context": paragraph[max(0, error.get("nStart", 0) - 20):min(len(paragraph), error.get("nEnd", 0) + 20)]
|
||||
}
|
||||
suggestions.append(suggestion)
|
||||
else:
|
||||
# Old API: GrammarChecker is in grammalecte.fr module
|
||||
logger.info("Using legacy grammalecte.fr.GrammarChecker API")
|
||||
gce = gr_fr.GrammarChecker("fr")
|
||||
|
||||
# Split text into paragraphs
|
||||
paragraphs = txt.getParagraph(text)
|
||||
|
||||
# Check grammar for each paragraph
|
||||
suggestions = []
|
||||
for i, paragraph in enumerate(paragraphs):
|
||||
if paragraph.strip():
|
||||
# Use parse method for older API
|
||||
for error in gce.parse(paragraph, "FR", False):
|
||||
if error["sType"] != "WORD" and error["bError"]:
|
||||
suggestion = {
|
||||
"paragraph": i + 1,
|
||||
"start": error["nStart"],
|
||||
"end": error["nEnd"],
|
||||
"type": error["sType"],
|
||||
"message": error["sMessage"],
|
||||
"suggestions": error.get("aSuggestions", []),
|
||||
"context": paragraph[max(0, error["nStart"] - 20):min(len(paragraph), error["nEnd"] + 20)]
|
||||
}
|
||||
suggestions.append(suggestion)
|
||||
|
||||
logger.info(f"Found {len(suggestions)} grammar suggestions")
|
||||
return suggestions
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error checking grammar with grammalecte: {e}")
|
||||
return []
|
||||
|
||||
def check_grammar_with_cli(text):
|
||||
"""
|
||||
Check grammar using grammalecte-cli command
|
||||
|
||||
Args:
|
||||
text (str): Text to check
|
||||
|
||||
Returns:
|
||||
list: List of grammar suggestions
|
||||
"""
|
||||
try:
|
||||
logger.info("Checking grammar with grammalecte-cli")
|
||||
|
||||
# Create a temporary file with the text
|
||||
temp_file = "temp_text_for_grammar_check.txt"
|
||||
with open(temp_file, 'w', encoding='utf-8') as f:
|
||||
f.write(text)
|
||||
|
||||
# Run grammalecte-cli
|
||||
cmd = ["grammalecte-cli", "--json", "--file", temp_file]
|
||||
result = subprocess.run(cmd, capture_output=True, text=True, encoding='utf-8')
|
||||
|
||||
# Remove temporary file
|
||||
if os.path.exists(temp_file):
|
||||
os.remove(temp_file)
|
||||
|
||||
if result.returncode != 0:
|
||||
logger.error(f"Error running grammalecte-cli: {result.stderr}")
|
||||
return []
|
||||
|
||||
# Parse JSON output
|
||||
output = json.loads(result.stdout)
|
||||
|
||||
# Extract grammar suggestions
|
||||
suggestions = []
|
||||
for paragraph_data in output.get("data", []):
|
||||
paragraph_index = paragraph_data.get("iParagraph", 0)
|
||||
for error in paragraph_data.get("lGrammarErrors", []):
|
||||
suggestion = {
|
||||
"paragraph": paragraph_index + 1,
|
||||
"start": error.get("nStart", 0),
|
||||
"end": error.get("nEnd", 0),
|
||||
"type": error.get("sType", ""),
|
||||
"message": error.get("sMessage", ""),
|
||||
"suggestions": error.get("aSuggestions", []),
|
||||
"context": error.get("sContext", "")
|
||||
}
|
||||
suggestions.append(suggestion)
|
||||
|
||||
logger.info(f"Found {len(suggestions)} grammar suggestions")
|
||||
return suggestions
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Error checking grammar with grammalecte-cli: {e}")
|
||||
return []
|
||||
|
||||
def check_grammar(text):
|
||||
"""
|
||||
Check grammar using available method (Python library or CLI)
|
||||
|
||||
Args:
|
||||
text (str): Text to check
|
||||
|
||||
Returns:
|
||||
list: List of grammar suggestions
|
||||
"""
|
||||
# Try using the Python library first
|
||||
if GRAMMALECTE_AVAILABLE:
|
||||
return check_grammar_with_grammalecte(text)
|
||||
|
||||
# Fall back to CLI if available
|
||||
try:
|
||||
# Check if grammalecte-cli is available
|
||||
subprocess.run(["grammalecte-cli", "--help"], capture_output=True)
|
||||
return check_grammar_with_cli(text)
|
||||
except (subprocess.SubprocessError, FileNotFoundError):
|
||||
logger.error("Neither grammalecte Python package nor grammalecte-cli is available.")
|
||||
logger.error("Please install grammalecte with: pip install grammalecte")
|
||||
return []
|
||||
|
||||
def select_page_for_grammar_check(pages, key=None):
|
||||
"""
|
||||
Select a page for grammar checking
|
||||
|
||||
Args:
|
||||
pages (list): List of dictionaries containing page information
|
||||
key (str): Key of the page to select (if None, select the first page)
|
||||
|
||||
Returns:
|
||||
dict: Selected page or None if no suitable page found
|
||||
"""
|
||||
if not pages:
|
||||
logger.warning("No pages found that need grammar checking")
|
||||
return None
|
||||
|
||||
if key:
|
||||
# Find the page with the specified key
|
||||
for page in pages:
|
||||
if page.get('key') == key:
|
||||
# Check if the page has a French version
|
||||
if page.get('fr_page') is None:
|
||||
logger.warning(f"Page with key '{key}' does not have a French version")
|
||||
return None
|
||||
logger.info(f"Selected page for key '{key}' for grammar checking")
|
||||
return page
|
||||
|
||||
logger.warning(f"No page found with key '{key}'")
|
||||
return None
|
||||
else:
|
||||
# Select the first page that has a French version
|
||||
for page in pages:
|
||||
if page.get('fr_page') is not None:
|
||||
logger.info(f"Selected first page with French version (key '{page['key']}') for grammar checking")
|
||||
return page
|
||||
|
||||
logger.warning("No pages found with French versions")
|
||||
return None
|
||||
|
||||
def main():
|
||||
"""Main function to execute the script"""
|
||||
parser = argparse.ArgumentParser(description="Suggest grammar improvements for an OSM wiki page using grammalecte")
|
||||
parser.add_argument("--page", help="Key of the page to check (default: first page with a French version)")
|
||||
args = parser.parse_args()
|
||||
|
||||
logger.info("Starting suggest_grammar_improvements.py")
|
||||
|
||||
# Load pages
|
||||
pages = load_outdated_pages()
|
||||
if not pages:
|
||||
logger.error("No pages found. Run wiki_compare.py first.")
|
||||
sys.exit(1)
|
||||
|
||||
# Select a page for grammar checking
|
||||
selected_page = select_page_for_grammar_check(pages, args.page)
|
||||
if not selected_page:
|
||||
logger.error("Could not select a page for grammar checking.")
|
||||
sys.exit(1)
|
||||
|
||||
# Get the French page URL
|
||||
fr_url = selected_page.get('fr_page', {}).get('url')
|
||||
if not fr_url:
|
||||
logger.error(f"No French page URL found for key '{selected_page['key']}'")
|
||||
sys.exit(1)
|
||||
|
||||
# Fetch the content of the French page
|
||||
logger.info(f"Fetching content from {fr_url}")
|
||||
content = fetch_wiki_page_content(fr_url)
|
||||
if not content:
|
||||
logger.error(f"Could not fetch content from {fr_url}")
|
||||
sys.exit(1)
|
||||
|
||||
# Check grammar
|
||||
logger.info(f"Checking grammar for key '{selected_page['key']}'")
|
||||
suggestions = check_grammar(content)
|
||||
if not suggestions:
|
||||
logger.warning("No grammar suggestions found or grammar checker not available")
|
||||
|
||||
# Save the grammar suggestions in the JSON file
|
||||
logger.info(f"Saving grammar suggestions for key '{selected_page['key']}'")
|
||||
selected_page['grammar_suggestions'] = suggestions
|
||||
|
||||
# Save the updated data back to the file
|
||||
save_to_json(pages, OUTDATED_PAGES_FILE)
|
||||
|
||||
logger.info("Script completed successfully")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
|
@ -1,5 +1,5 @@
|
|||
{
|
||||
"last_updated": "2025-08-22T15:22:37.234265",
|
||||
"last_updated": "2025-08-31T11:28:25.264096",
|
||||
"untranslated_pages": [
|
||||
{
|
||||
"title": "FR:2017 Ouragans Irma et Maria",
|
||||
|
@ -517,12 +517,6 @@
|
|||
"url": "https://wiki.openstreetmap.org/wiki/FR:Compl%C3%A8te_Tes_Commerces/Avanc%C3%A9",
|
||||
"has_translation": false
|
||||
},
|
||||
{
|
||||
"title": "FR:Complète Tes Commerces/Debutant",
|
||||
"key": "Complète Tes Commerces/Debutant",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/FR:Compl%C3%A8te_Tes_Commerces/Debutant",
|
||||
"has_translation": false
|
||||
},
|
||||
{
|
||||
"title": "FR:Complète Tes Commerces/Débutant",
|
||||
"key": "Complète Tes Commerces/Débutant",
|
||||
|
@ -942,6 +936,12 @@
|
|||
"key": "France/Map/Terres australes et antarctiques françaises",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/FR:France/Map/Terres_australes_et_antarctiques_fran%C3%A7aises",
|
||||
"has_translation": false
|
||||
},
|
||||
{
|
||||
"title": "FR:France/Map/Wallis-et-Futuna",
|
||||
"key": "France/Map/Wallis-et-Futuna",
|
||||
"url": "https://wiki.openstreetmap.org/wiki/FR:France/Map/Wallis-et-Futuna",
|
||||
"has_translation": false
|
||||
}
|
||||
]
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue