ajout infos des archives de proposition wiki

This commit is contained in:
Tykayn 2025-08-31 12:22:07 +02:00 committed by tykayn
parent 7665f1d99c
commit 9bd1fddd8a
9 changed files with 2517 additions and 27 deletions

View file

@ -5,7 +5,7 @@ jour ou de traductions, et publier des suggestions sur Mastodon pour encourager
## Vue d'ensemble
Le projet comprend neuf scripts principaux :
Le projet comprend dix scripts principaux :
1. **wiki_compare.py** : Récupère les 50 clés OSM les plus utilisées, compare leurs pages wiki en anglais et en
français, et identifie celles qui ont besoin de mises à jour.
@ -15,19 +15,21 @@ Le projet comprend neuf scripts principaux :
suggestion de traduction sur Mastodon.
4. **propose_translation.py** : Sélectionne une page wiki (par défaut la première) et utilise Ollama avec le modèle
"mistral:7b" pour proposer une traduction, qui est sauvegardée dans le fichier outdated_pages.json.
5. **detect_suspicious_deletions.py** : Analyse les changements récents du wiki OSM pour détecter les suppressions
5. **suggest_grammar_improvements.py** : Sélectionne une page wiki française (par défaut la première) et utilise grammalecte
pour vérifier la grammaire et proposer des améliorations, qui sont sauvegardées dans le fichier outdated_pages.json.
6. **detect_suspicious_deletions.py** : Analyse les changements récents du wiki OSM pour détecter les suppressions
suspectes (plus de 20 caractères) et les enregistre dans un fichier JSON pour affichage sur le site web.
6. **fetch_proposals.py** : Récupère les propositions de tags OSM en cours de vote et les propositions récemment modifiées,
7. **fetch_proposals.py** : Récupère les propositions de tags OSM en cours de vote et les propositions récemment modifiées,
et les enregistre dans un fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure
pour éviter des requêtes trop fréquentes au serveur wiki.
7. **find_untranslated_french_pages.py** : Identifie les pages wiki françaises qui n'ont pas de traduction en anglais
8. **find_untranslated_french_pages.py** : Identifie les pages wiki françaises qui n'ont pas de traduction en anglais
et les enregistre dans un fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure.
8. **find_pages_unavailable_in_french.py** : Scrape la catégorie des pages non disponibles en français, gère la pagination
9. **find_pages_unavailable_in_french.py** : Scrape la catégorie des pages non disponibles en français, gère la pagination
pour récupérer toutes les pages, les groupe par préfixe de langue et priorise les pages commençant par "En:". Les données
sont mises en cache pendant une heure.
9. **fetch_osm_fr_groups.py** : Récupère les informations sur les groupes de travail et les groupes locaux d'OSM-FR
depuis la section #Pages_des_groupes_locaux et les enregistre dans un fichier JSON pour affichage sur le site web.
Les données sont mises en cache pendant une heure.
10. **fetch_osm_fr_groups.py** : Récupère les informations sur les groupes de travail et les groupes locaux d'OSM-FR
depuis la section #Pages_des_groupes_locaux et les enregistre dans un fichier JSON pour affichage sur le site web.
Les données sont mises en cache pendant une heure.
## Installation
@ -53,6 +55,12 @@ Pour utiliser le script propose_translation.py, vous devez également installer
ollama pull mistral:7b
```
Pour utiliser le script suggest_grammar_improvements.py, vous devez installer grammalecte :
```bash
pip install grammalecte
```
## Configuration
### Mastodon API
@ -144,6 +152,28 @@ ollama pull mistral:7b
Le script enregistre la traduction proposée dans la propriété "proposed_translation" de l'entrée correspondante dans le fichier outdated_pages.json.
### Suggérer des améliorations grammaticales avec grammalecte
Pour sélectionner une page wiki française (par défaut la première avec une version française) et générer des suggestions d'amélioration grammaticale avec grammalecte :
```bash
./suggest_grammar_improvements.py
```
Pour vérifier une page spécifique en utilisant sa clé :
```bash
./suggest_grammar_improvements.py --page type
```
Note : Ce script nécessite que grammalecte soit installé. Pour l'installer, exécutez :
```bash
pip install grammalecte
```
Le script enregistre les suggestions grammaticales dans la propriété "grammar_suggestions" de l'entrée correspondante dans le fichier outdated_pages.json. Ces suggestions sont ensuite utilisées par Symfony dans le template pour afficher des corrections possibles sur la version française de la page dans une section dédiée.
### Détecter les suppressions suspectes
Pour analyser les changements récents du wiki OSM et détecter les suppressions suspectes :
@ -302,7 +332,28 @@ Contient des informations détaillées sur les pages qui ont besoin de mises à
"date_diff": 491,
"word_diff": 700,
"section_diff": 2,
"priority": 250.5
"priority": 250.5,
"proposed_translation": "Texte de la traduction proposée...",
"grammar_suggestions": [
{
"paragraph": 1,
"start": 45,
"end": 52,
"type": "ACCORD",
"message": "Accord avec le nom : « bâtiments » est masculin pluriel.",
"suggestions": ["grands"],
"context": "...les grandes bâtiments de la ville..."
},
{
"paragraph": 3,
"start": 120,
"end": 128,
"type": "CONJUGAISON",
"message": "Conjugaison erronée. Accord avec « ils ».",
"suggestions": ["peuvent"],
"context": "...les bâtiments peut être classés..."
}
]
},
{
"key": "amenity",

View file

@ -0,0 +1,567 @@
{
"last_updated": "2025-08-31T12:12:58.757275",
"proposals": [
{
"title": "Proposal:4WD Only",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:4WD_Only",
"last_modified": "30 April 2023",
"proposer": "Gaffa",
"section_count": 0,
"link_count": 16,
"word_count": 152,
"votes": {
"approve": {
"count": 0,
"users": []
},
"oppose": {
"count": 0,
"users": []
},
"abstain": {
"count": 0,
"users": []
}
},
"total_votes": 0,
"approve_percentage": 0,
"oppose_percentage": 0,
"abstain_percentage": 0
},
{
"title": "Proposal:Access: name space",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Access:_name_space",
"last_modified": "30 April 2023",
"proposer": "Hawke",
"section_count": 0,
"link_count": 11,
"word_count": 109,
"votes": {
"approve": {
"count": 0,
"users": []
},
"oppose": {
"count": 0,
"users": []
},
"abstain": {
"count": 0,
"users": []
}
},
"total_votes": 0,
"approve_percentage": 0,
"oppose_percentage": 0,
"abstain_percentage": 0
},
{
"title": "Proposal:Add ability to specify ordering-only phone number, sms-only phone numbers and related tags",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Add_ability_to_specify_ordering-only_phone_number,_sms-only_phone_numbers_and_related_tags",
"last_modified": "8 June 2025",
"proposer": "JOlshefsky",
"section_count": 10,
"link_count": 104,
"word_count": 808,
"votes": {
"approve": {
"count": 11,
"users": [
{
"username": "JOlshefsky",
"date": "10:52, 16 July 2024"
},
{
"username": "Mueschel",
"date": "09:40, 18 July 2024"
},
{
"username": "Chris2map",
"date": "17:11, 18 July 2024"
},
{
"username": "Broiledpeas",
"date": "22:20, 18 July 2024"
},
{
"username": "Nospam2005",
"date": "19:59, 21 July 2024"
},
{
"username": "GanderPL",
"date": "08:50, 22 July 2024"
},
{
"username": "EneaSuper",
"date": "14:04, 22 July 2024"
},
{
"username": "Uboot",
"date": "07:20, 25 July 2024"
},
{
"username": "Jean-Baptiste",
"date": "12:40, 28 July 2024"
},
{
"username": "Emmanuel",
"date": "18:14, 28 July 2024"
},
{
"username": "Hocu",
"date": "06:06, 31 July 2024"
}
]
},
"oppose": {
"count": 0,
"users": []
},
"abstain": {
"count": 4,
"users": [
{
"username": "Woodpeck",
"date": "21:15, 19 July 2024"
},
{
"username": "Hedaja",
"date": "15:41, 21 July 2024"
},
{
"username": "501ghost",
"date": "06:36, 24 July 2024"
},
{
"username": "Nadjita",
"date": "08:33, 26 July 2024"
}
]
}
},
"total_votes": 15,
"approve_percentage": 73.3,
"oppose_percentage": 0.0,
"abstain_percentage": 26.7
},
{
"title": "Proposal:Add strolling to sac scale and some further refinements",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Add_strolling_to_sac_scale_and_some_further_refinements",
"last_modified": "5 November 2024",
"proposer": null,
"section_count": 20,
"link_count": 268,
"word_count": 2329,
"votes": {
"approve": {
"count": 21,
"users": [
{
"username": "Supsup",
"date": "18:21, 15 October 2024"
},
{
"username": "Cick0",
"date": "21:43, 15 October 2024"
},
{
"username": "Fizzie41",
"date": "21:47, 15 October 2024"
},
{
"username": "Segubi",
"date": null
},
{
"username": "rhhs",
"date": "06:43, 16 October 2024"
},
{
"username": "Alan",
"date": "07:39, 16 October 2024"
},
{
"username": "VojtaFilip",
"date": "08:00, 16 October 2024"
},
{
"username": "Adamfranco",
"date": "13:21, 16 October 2024"
},
{
"username": "Yvecai",
"date": "05:07, 18 October 2024"
},
{
"username": "Woazboat",
"date": "12:15, 18 October 2024"
},
{
"username": "julcnx",
"date": "15:16, 20 October 2024"
},
{
"username": "JIDB",
"date": "17:17, 20 October 2024"
},
{
"username": "Pb07",
"date": "18:04, 20 October 2024"
},
{
"username": "Lumikeiju",
"date": "18:46, 20 October 2024"
},
{
"username": "Heilbron",
"date": "20:43, 20 October 2024"
},
{
"username": "Aighes",
"date": "21:30, 20 October 2024"
},
{
"username": "Crodthauser",
"date": "21:55, 20 October 2024"
},
{
"username": "Adiatmad",
"date": "02:26, 21 October 2024"
},
{
"username": "Jonathan",
"date": "10:46, 21 October 2024"
},
{
"username": "mahau",
"date": "17:34, 21 October 2024"
},
{
"username": "EneaSuper",
"date": "13:43, 27 October 2024"
}
]
},
"oppose": {
"count": 4,
"users": [
{
"username": "chris66",
"date": "06:22, 16 October 2024"
},
{
"username": "Skyper",
"date": "11:55, 16 October 2024"
},
{
"username": "Nop",
"date": "06:41, 17 October 2024"
},
{
"username": "Fabi2",
"date": "19:27, 20 October 2024"
}
]
},
"abstain": {
"count": 2,
"users": [
{
"username": "Chris2map",
"date": "17:39, 17 October 2024"
},
{
"username": "Nospam2005",
"date": "13:40, 20 October 2024"
}
]
}
},
"total_votes": 27,
"approve_percentage": 77.8,
"oppose_percentage": 14.8,
"abstain_percentage": 7.4
}
],
"statistics": {
"total_proposals": 4,
"total_votes": 42,
"avg_votes_per_proposal": 10.5,
"unique_voters": 39,
"top_voters": [
{
"username": "Chris2map",
"total": 2,
"approve": 1,
"oppose": 0,
"abstain": 1
},
{
"username": "Nospam2005",
"total": 2,
"approve": 1,
"oppose": 0,
"abstain": 1
},
{
"username": "EneaSuper",
"total": 2,
"approve": 2,
"oppose": 0,
"abstain": 0
},
{
"username": "JOlshefsky",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Mueschel",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Broiledpeas",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "GanderPL",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Uboot",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Jean-Baptiste",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Emmanuel",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Hocu",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Woodpeck",
"total": 1,
"approve": 0,
"oppose": 0,
"abstain": 1
},
{
"username": "Hedaja",
"total": 1,
"approve": 0,
"oppose": 0,
"abstain": 1
},
{
"username": "501ghost",
"total": 1,
"approve": 0,
"oppose": 0,
"abstain": 1
},
{
"username": "Nadjita",
"total": 1,
"approve": 0,
"oppose": 0,
"abstain": 1
},
{
"username": "Supsup",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Cick0",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Fizzie41",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Segubi",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "rhhs",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Alan",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "VojtaFilip",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Adamfranco",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Yvecai",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Woazboat",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "julcnx",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "JIDB",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Pb07",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Lumikeiju",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Heilbron",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Aighes",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Crodthauser",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Adiatmad",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "Jonathan",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "mahau",
"total": 1,
"approve": 1,
"oppose": 0,
"abstain": 0
},
{
"username": "chris66",
"total": 1,
"approve": 0,
"oppose": 1,
"abstain": 0
},
{
"username": "Skyper",
"total": 1,
"approve": 0,
"oppose": 1,
"abstain": 0
},
{
"username": "Nop",
"total": 1,
"approve": 0,
"oppose": 1,
"abstain": 0
},
{
"username": "Fabi2",
"total": 1,
"approve": 0,
"oppose": 1,
"abstain": 0
}
]
}
}

View file

@ -0,0 +1,66 @@
{
"last_updated": "2025-08-31T12:11:31.163320",
"proposals": [
{
"title": "Proposal:4WD Only",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:4WD_Only",
"last_modified": "30 April 2023",
"proposer": "Gaffa",
"section_count": 0,
"link_count": 16,
"word_count": 152,
"votes": {
"approve": {
"count": 0,
"users": []
},
"oppose": {
"count": 0,
"users": []
},
"abstain": {
"count": 0,
"users": []
}
},
"total_votes": 0,
"approve_percentage": 0,
"oppose_percentage": 0,
"abstain_percentage": 0
},
{
"title": "Proposal:Access: name space",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Access:_name_space",
"last_modified": "30 April 2023",
"proposer": "Hawke",
"section_count": 0,
"link_count": 11,
"word_count": 109,
"votes": {
"approve": {
"count": 0,
"users": []
},
"oppose": {
"count": 0,
"users": []
},
"abstain": {
"count": 0,
"users": []
}
},
"total_votes": 0,
"approve_percentage": 0,
"oppose_percentage": 0,
"abstain_percentage": 0
}
],
"statistics": {
"total_proposals": 2,
"total_votes": 0,
"avg_votes_per_proposal": 0.0,
"unique_voters": 0,
"top_voters": []
}
}

View file

@ -0,0 +1,625 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
fetch_archived_proposals.py
This script scrapes archived proposals from the OpenStreetMap wiki and extracts voting information.
It analyzes the voting patterns, counts votes by type (approve, oppose, abstain), and collects
information about the users who voted.
The script saves the data to a JSON file that can be used by the Symfony application.
Usage:
python fetch_archived_proposals.py [--force] [--limit N]
Options:
--force Force refresh of all proposals, even if they have already been processed
--limit N Limit processing to N proposals (default: process all proposals)
Output:
- archived_proposals.json file with voting information
"""
import argparse
import json
import logging
import os
import re
import sys
import time
from datetime import datetime
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
ARCHIVED_PROPOSALS_URL = "https://wiki.openstreetmap.org/wiki/Category:Archived_proposals"
import os
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
ARCHIVED_PROPOSALS_FILE = os.path.join(SCRIPT_DIR, "archived_proposals.json")
USER_AGENT = "OSM-Commerces/1.0 (https://github.com/yourusername/osm-commerces; your@email.com)"
RATE_LIMIT_DELAY = 1 # seconds between requests to avoid rate limiting
# Vote patterns
VOTE_PATTERNS = {
'approve': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:approve|support|agree\s+with)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:approve|support)',
r'(?:Symbol\s+support\s+vote\.svg|Symbol_support_vote\.svg)',
],
'oppose': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:oppose|disagree\s+with|reject|do\s+not\s+support)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:oppose|reject|against)',
r'(?:Symbol\s+oppose\s+vote\.svg|Symbol_oppose_vote\.svg)',
],
'abstain': [
r'I\s+(?:have\s+comments\s+but\s+)?abstain\s+from\s+voting',
r'I\s+(?:have\s+comments\s+but\s+)?(?:neither\s+approve\s+nor\s+oppose|am\s+neutral)',
r'(?:Symbol\s+abstain\s+vote\.svg|Symbol_abstain_vote\.svg)',
]
}
def parse_arguments():
"""Parse command line arguments"""
parser = argparse.ArgumentParser(description='Fetch and analyze archived OSM proposals')
parser.add_argument('--force', action='store_true', help='Force refresh of all proposals')
parser.add_argument('--limit', type=int, help='Limit processing to N proposals (default: process all)')
return parser.parse_args()
def load_existing_data():
"""Load existing archived proposals data if available"""
if os.path.exists(ARCHIVED_PROPOSALS_FILE):
try:
with open(ARCHIVED_PROPOSALS_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
logger.info(f"Loaded {len(data.get('proposals', []))} existing proposals from {ARCHIVED_PROPOSALS_FILE}")
return data
except (json.JSONDecodeError, IOError) as e:
logger.error(f"Error loading existing data: {e}")
# Return empty structure if file doesn't exist or has errors
return {
'last_updated': None,
'proposals': []
}
def save_data(data):
"""Save data to JSON file"""
try:
# Update last_updated timestamp
data['last_updated'] = datetime.now().isoformat()
with open(ARCHIVED_PROPOSALS_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Saved {len(data.get('proposals', []))} proposals to {ARCHIVED_PROPOSALS_FILE}")
except IOError as e:
logger.error(f"Error saving data: {e}")
except Exception as e:
logger.error(f"Unexpected error saving data: {e}")
def fetch_page(url):
"""Fetch a page from the OSM wiki"""
headers = {
'User-Agent': USER_AGENT
}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def get_proposal_urls():
"""Get URLs of all archived proposals"""
logger.info(f"Fetching archived proposals list from {ARCHIVED_PROPOSALS_URL}")
html = fetch_page(ARCHIVED_PROPOSALS_URL)
if not html:
return []
soup = BeautifulSoup(html, 'html.parser')
# Find all links in the category pages
proposal_urls = []
# Get proposals from the main category page
category_content = soup.select_one('#mw-pages')
if category_content:
for link in category_content.select('a'):
if link.get('title') and 'Category:' not in link.get('title'):
proposal_urls.append({
'title': link.get('title'),
'url': urljoin(ARCHIVED_PROPOSALS_URL, link.get('href'))
})
# Check if there are subcategories
subcategories = soup.select('#mw-subcategories a')
for subcat in subcategories:
if 'Category:' in subcat.get('title', ''):
logger.info(f"Found subcategory: {subcat.get('title')}")
subcat_url = urljoin(ARCHIVED_PROPOSALS_URL, subcat.get('href'))
# Fetch the subcategory page
time.sleep(RATE_LIMIT_DELAY) # Respect rate limits
subcat_html = fetch_page(subcat_url)
if subcat_html:
subcat_soup = BeautifulSoup(subcat_html, 'html.parser')
subcat_content = subcat_soup.select_one('#mw-pages')
if subcat_content:
for link in subcat_content.select('a'):
if link.get('title') and 'Category:' not in link.get('title'):
proposal_urls.append({
'title': link.get('title'),
'url': urljoin(ARCHIVED_PROPOSALS_URL, link.get('href'))
})
logger.info(f"Found {len(proposal_urls)} archived proposals")
return proposal_urls
def extract_username(text):
"""Extract username from a signature line"""
# Common patterns for signatures
patterns = [
r'--\s*\[\[User:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User:Username|Username]]
r'--\s*\[\[User:([^|\]]+)\]\]', # --[[User:Username]]
r'--\s*\[\[User talk:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User talk:Username|Username]]
r'--\s*\[\[User talk:([^|\]]+)\]\]', # --[[User talk:Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[Special:Contributions/Username|Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)\]\]', # --[[Special:Contributions/Username]]
]
for pattern in patterns:
match = re.search(pattern, text)
if match:
return match.group(1).strip()
# If no match found with the patterns, try to find any username-like string
match = re.search(r'--\s*([A-Za-z0-9_-]+)', text)
if match:
return match.group(1).strip()
return None
def extract_date(text):
"""Extract date from a signature line"""
# Look for common date formats in signatures
date_patterns = [
r'(\d{1,2}:\d{2}, \d{1,2} [A-Za-z]+ \d{4})', # 15:30, 25 December 2023
r'(\d{1,2} [A-Za-z]+ \d{4} \d{1,2}:\d{2})', # 25 December 2023 15:30
r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})', # 2023-12-25T15:30:00
]
for pattern in date_patterns:
match = re.search(pattern, text)
if match:
return match.group(1)
return None
def determine_vote_type(text):
"""Determine the type of vote from the text"""
text_lower = text.lower()
for vote_type, patterns in VOTE_PATTERNS.items():
for pattern in patterns:
if re.search(pattern, text_lower, re.IGNORECASE):
return vote_type
return None
def extract_votes(html):
"""Extract voting information from proposal HTML"""
soup = BeautifulSoup(html, 'html.parser')
# Find the voting section
voting_section = None
for heading in soup.find_all(['h2', 'h3']):
heading_text = heading.get_text().lower()
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
voting_section = heading
break
if not voting_section:
logger.warning("No voting section found")
return {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# Get the content after the voting section heading
votes_content = []
current = voting_section.next_sibling
# Collect all elements until the next heading or the end of the document
while current and not current.name in ['h2', 'h3']:
if current.name: # Skip NavigableString objects
votes_content.append(current)
current = current.next_sibling
# Process vote lists
votes = {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# For tracking vote dates to calculate duration
all_vote_dates = []
# Look for lists of votes
for element in votes_content:
if element.name == 'ul':
for li in element.find_all('li'):
vote_text = li.get_text()
vote_type = determine_vote_type(vote_text)
if vote_type:
username = extract_username(vote_text)
date = extract_date(vote_text)
# Extract comment by removing vote declaration and signature
comment = vote_text
# Remove vote declaration patterns
for pattern in VOTE_PATTERNS[vote_type]:
comment = re.sub(pattern, '', comment, flags=re.IGNORECASE)
# Remove signature
signature_patterns = [
r'--\s*\[\[User:[^]]+\]\].*$',
r'--\s*\[\[User talk:[^]]+\]\].*$',
r'--\s*\[\[Special:Contributions/[^]]+\]\].*$',
r'--\s*[A-Za-z0-9_-]+.*$'
]
for pattern in signature_patterns:
comment = re.sub(pattern, '', comment, flags=re.IGNORECASE)
# Clean up the comment
comment = comment.strip()
if username:
votes[vote_type]['count'] += 1
votes[vote_type]['users'].append({
'username': username,
'date': date,
'comment': comment
})
# Add date to list for duration calculation if it's valid
if date:
try:
# Try to parse the date in different formats
parsed_date = None
for date_format in [
'%H:%M, %d %B %Y', # 15:30, 25 December 2023
'%d %B %Y %H:%M', # 25 December 2023 15:30
'%Y-%m-%dT%H:%M:%S' # 2023-12-25T15:30:00
]:
try:
parsed_date = datetime.strptime(date, date_format)
break
except ValueError:
continue
if parsed_date:
all_vote_dates.append(parsed_date)
except Exception as e:
logger.warning(f"Could not parse date '{date}': {e}")
# Calculate vote duration if we have at least two dates
if len(all_vote_dates) >= 2:
all_vote_dates.sort()
first_vote = all_vote_dates[0]
last_vote = all_vote_dates[-1]
vote_duration_days = (last_vote - first_vote).days
votes['first_vote'] = first_vote.strftime('%Y-%m-%d')
votes['last_vote'] = last_vote.strftime('%Y-%m-%d')
votes['duration_days'] = vote_duration_days
return votes
def extract_proposal_metadata(html, url):
"""Extract metadata about the proposal"""
soup = BeautifulSoup(html, 'html.parser')
# Get title
title_element = soup.select_one('#firstHeading')
title = title_element.get_text() if title_element else "Unknown Title"
# Get last modified date
last_modified = None
footer_info = soup.select_one('#footer-info-lastmod')
if footer_info:
last_modified_text = footer_info.get_text()
match = re.search(r'(\d{1,2} [A-Za-z]+ \d{4})', last_modified_text)
if match:
last_modified = match.group(1)
# Get content element for further processing
content = soup.select_one('#mw-content-text')
# Get proposer from the page
proposer = None
# Get proposal status from the page
status = None
# Look for table rows to find proposer and status
if content:
# Look for table rows
for row in content.select('tr'):
# Check if the row has at least two cells (th and td)
cells = row.select('th, td')
if len(cells) >= 2:
# Get the header text from the first cell
header_text = cells[0].get_text().strip().lower()
# Check for "Proposed by:" to find proposer
if "proposed by" in header_text:
# Look for user link in the next cell
user_link = cells[1].select_one('a[href*="/wiki/User:"]')
if user_link:
# Extract username from the link
href = user_link.get('href', '')
title = user_link.get('title', '')
# Try to get username from title attribute first
if title and title.startswith('User:'):
proposer = title[5:] # Remove 'User:' prefix
# Otherwise try to extract from href
elif href:
href_match = re.search(r'/wiki/User:([^/]+)', href)
if href_match:
proposer = href_match.group(1)
# If still no proposer, use the link text
if not proposer and user_link.get_text():
proposer = user_link.get_text().strip()
logger.info(f"Found proposer in table: {proposer}")
# Check for "Proposal status:" to find status
elif "proposal status" in header_text:
# Get the status from the next cell
status_cell = cells[1]
# First try to find a link with a category title containing status
status_link = status_cell.select_one('a[title*="Category:Proposals with"]')
if status_link:
# Extract status from the title attribute
status_match = re.search(r'Category:Proposals with "([^"]+)" status', status_link.get('title', ''))
if status_match:
status = status_match.group(1)
logger.info(f"Found status in table link: {status}")
# If no status found in link, try to get text content
if not status:
status_text = status_cell.get_text().strip()
# Try to match one of the known statuses
known_statuses = [
"Draft", "Proposed", "Voting", "Post-vote", "Approved",
"Rejected", "Abandoned", "Canceled", "Obsoleted",
"Inactive", "Undefined"
]
for known_status in known_statuses:
if known_status.lower() in status_text.lower():
status = known_status
logger.info(f"Found status in table text: {status}")
break
# If no proposer found in table, try the first paragraph method
if not proposer:
first_paragraph = soup.select_one('#mw-content-text p')
if first_paragraph:
proposer_match = re.search(r'(?:proposed|created|authored)\s+by\s+\[\[User:([^|\]]+)', first_paragraph.get_text())
if proposer_match:
proposer = proposer_match.group(1)
logger.info(f"Found proposer in paragraph: {proposer}")
# Count sections, links, and words
section_count = len(soup.select('#mw-content-text h2, #mw-content-text h3, #mw-content-text h4')) if content else 0
# Count links excluding user/talk pages (voting signatures)
links = []
if content:
for link in content.select('a'):
href = link.get('href', '')
if href and not re.search(r'User:|User_talk:|Special:Contributions', href):
links.append(href)
link_count = len(links)
# Approximate word count
word_count = 0
if content:
# Get text content excluding navigation elements
for nav in content.select('.navbox, .ambox, .tmbox, .mw-editsection'):
nav.decompose()
# Also exclude the voting section to count only the proposal content
voting_section = None
for heading in content.find_all(['h2', 'h3']):
heading_text = heading.get_text().lower()
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
voting_section = heading
break
if voting_section:
# Remove the voting section and everything after it
current = voting_section
while current:
next_sibling = current.next_sibling
current.decompose()
current = next_sibling
# Count words in the remaining content
text = content.get_text()
word_count = len(re.findall(r'\b\w+\b', text))
return {
'title': title,
'url': url,
'last_modified': last_modified,
'proposer': proposer,
'status': status,
'section_count': section_count,
'link_count': link_count,
'word_count': word_count
}
def process_proposal(proposal, force=False):
"""Process a single proposal and extract voting information"""
url = proposal['url']
title = proposal['title']
logger.info(f"Processing proposal: {title}")
# Fetch the proposal page
html = fetch_page(url)
if not html:
return None
# Extract metadata
metadata = extract_proposal_metadata(html, url)
# Extract votes
votes = extract_votes(html)
# Combine metadata and votes
result = {**metadata, 'votes': votes}
# Calculate total votes and percentages
total_votes = votes['approve']['count'] + votes['oppose']['count'] + votes['abstain']['count']
if total_votes > 0:
result['total_votes'] = total_votes
result['approve_percentage'] = round((votes['approve']['count'] / total_votes) * 100, 1)
result['oppose_percentage'] = round((votes['oppose']['count'] / total_votes) * 100, 1)
result['abstain_percentage'] = round((votes['abstain']['count'] / total_votes) * 100, 1)
else:
result['total_votes'] = 0
result['approve_percentage'] = 0
result['oppose_percentage'] = 0
result['abstain_percentage'] = 0
return result
def main():
"""Main function to execute the script"""
args = parse_arguments()
force = args.force
limit = args.limit
logger.info("Starting fetch_archived_proposals.py")
if limit:
logger.info(f"Processing limited to {limit} proposals")
# Load existing data
data = load_existing_data()
# Get list of proposal URLs
proposal_urls = get_proposal_urls()
# Apply limit if specified
if limit and limit < len(proposal_urls):
logger.info(f"Limiting processing from {len(proposal_urls)} to {limit} proposals")
proposal_urls = proposal_urls[:limit]
# Create a map of existing proposals by URL for quick lookup
existing_proposals = {p['url']: p for p in data.get('proposals', [])}
# Process each proposal
new_proposals = []
processed_count = 0
for proposal in proposal_urls:
url = proposal['url']
original_title = proposal['title']
# Skip if already processed and not forcing refresh
if url in existing_proposals and not force:
logger.info(f"Skipping already processed proposal: {original_title}")
new_proposals.append(existing_proposals[url])
continue
# Process the proposal
time.sleep(RATE_LIMIT_DELAY) # Respect rate limits
processed = process_proposal(proposal, force)
if processed:
# Ensure the title is preserved from the original proposal
if processed.get('title') != original_title:
logger.warning(f"Title changed during processing from '{original_title}' to '{processed.get('title')}'. Restoring original title.")
processed['title'] = original_title
new_proposals.append(processed)
processed_count += 1
# Check if we've reached the limit
if limit and processed_count >= limit:
logger.info(f"Reached limit of {limit} processed proposals")
break
# Update the data
data['proposals'] = new_proposals
# Calculate global statistics
total_proposals = len(new_proposals)
total_votes = sum(p.get('total_votes', 0) for p in new_proposals)
avg_votes_per_proposal = round(total_votes / total_proposals, 1) if total_proposals > 0 else 0
# Count unique voters
all_voters = set()
for p in new_proposals:
for vote_type in ['approve', 'oppose', 'abstain']:
for user in p.get('votes', {}).get(vote_type, {}).get('users', []):
if 'username' in user:
all_voters.add(user['username'])
# Find most active voters
voter_counts = {}
for p in new_proposals:
for vote_type in ['approve', 'oppose', 'abstain']:
for user in p.get('votes', {}).get(vote_type, {}).get('users', []):
if 'username' in user:
username = user['username']
if username not in voter_counts:
voter_counts[username] = {'total': 0, 'approve': 0, 'oppose': 0, 'abstain': 0}
voter_counts[username]['total'] += 1
voter_counts[username][vote_type] += 1
# Sort voters by total votes
top_voters = sorted(
[{'username': k, **v} for k, v in voter_counts.items()],
key=lambda x: x['total'],
reverse=True
)[:100] # Top 20 voters
# Add statistics to the data
data['statistics'] = {
'total_proposals': total_proposals,
'total_votes': total_votes,
'avg_votes_per_proposal': avg_votes_per_proposal,
'unique_voters': len(all_voters),
'top_voters': top_voters
}
# Save the data
save_data(data)
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -7,6 +7,8 @@ import json
import logging
import argparse
import os
import re
import time
from datetime import datetime, timedelta
# Configure logging
@ -26,6 +28,25 @@ OUTPUT_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'proposal
# Cache timeout (in hours)
CACHE_TIMEOUT = 1
# Vote patterns (same as in fetch_archived_proposals.py)
VOTE_PATTERNS = {
'approve': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:approve|support|agree\s+with)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:approve|support)',
r'(?:Symbol\s+support\s+vote\.svg|Symbol_support_vote\.svg)',
],
'oppose': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:oppose|disagree\s+with|reject|do\s+not\s+support)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:oppose|reject|against)',
r'(?:Symbol\s+oppose\s+vote\.svg|Symbol_oppose_vote\.svg)',
],
'abstain': [
r'I\s+(?:have\s+comments\s+but\s+)?abstain\s+from\s+voting',
r'I\s+(?:have\s+comments\s+but\s+)?(?:neither\s+approve\s+nor\s+oppose|am\s+neutral)',
r'(?:Symbol\s+abstain\s+vote\.svg|Symbol_abstain_vote\.svg)',
]
}
def should_update_cache():
"""
Check if the cache file exists and if it's older than the cache timeout
@ -46,6 +67,134 @@ def should_update_cache():
logger.info(f"Cache is still fresh (less than {CACHE_TIMEOUT} hour(s) old)")
return False
def fetch_page(url):
"""
Fetch a page from the OSM wiki
"""
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def extract_username(text):
"""
Extract username from a signature line
"""
# Common patterns for signatures
patterns = [
r'--\s*\[\[User:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User:Username|Username]]
r'--\s*\[\[User:([^|\]]+)\]\]', # --[[User:Username]]
r'--\s*\[\[User talk:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User talk:Username|Username]]
r'--\s*\[\[User talk:([^|\]]+)\]\]', # --[[User talk:Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[Special:Contributions/Username|Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)\]\]', # --[[Special:Contributions/Username]]
]
for pattern in patterns:
match = re.search(pattern, text)
if match:
return match.group(1).strip()
# If no match found with the patterns, try to find any username-like string
match = re.search(r'--\s*([A-Za-z0-9_-]+)', text)
if match:
return match.group(1).strip()
return None
def extract_date(text):
"""
Extract date from a signature line
"""
# Look for common date formats in signatures
date_patterns = [
r'(\d{1,2}:\d{2}, \d{1,2} [A-Za-z]+ \d{4})', # 15:30, 25 December 2023
r'(\d{1,2} [A-Za-z]+ \d{4} \d{1,2}:\d{2})', # 25 December 2023 15:30
r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})', # 2023-12-25T15:30:00
]
for pattern in date_patterns:
match = re.search(pattern, text)
if match:
return match.group(1)
return None
def determine_vote_type(text):
"""
Determine the type of vote from the text
"""
text_lower = text.lower()
for vote_type, patterns in VOTE_PATTERNS.items():
for pattern in patterns:
if re.search(pattern, text_lower, re.IGNORECASE):
return vote_type
return None
def extract_votes(html):
"""
Extract voting information from proposal HTML
"""
soup = BeautifulSoup(html, 'html.parser')
# Find the voting section
voting_section = None
for heading in soup.find_all(['h2', 'h3']):
heading_text = heading.get_text().lower()
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
voting_section = heading
break
if not voting_section:
logger.warning("No voting section found")
return {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# Get the content after the voting section heading
votes_content = []
current = voting_section.next_sibling
# Collect all elements until the next heading or the end of the document
while current and not current.name in ['h2', 'h3']:
if current.name: # Skip NavigableString objects
votes_content.append(current)
current = current.next_sibling
# Process vote lists
votes = {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# Look for lists of votes
for element in votes_content:
if element.name == 'ul':
for li in element.find_all('li'):
vote_text = li.get_text()
vote_type = determine_vote_type(vote_text)
if vote_type:
username = extract_username(vote_text)
date = extract_date(vote_text)
if username:
votes[vote_type]['count'] += 1
votes[vote_type]['users'].append({
'username': username,
'date': date
})
return votes
def fetch_voting_proposals():
"""
Fetch proposals with "Voting" status from the OSM Wiki
@ -69,12 +218,72 @@ def fetch_voting_proposals():
proposal_title = link.text.strip()
proposal_url = 'https://wiki.openstreetmap.org' + link.get('href', '')
proposals.append({
# Create a basic proposal object
proposal = {
'title': proposal_title,
'url': proposal_url,
'status': 'Voting',
'type': 'voting'
})
}
# Fetch the proposal page to extract voting information
logger.info(f"Fetching proposal page: {proposal_title}")
html = fetch_page(proposal_url)
if html:
# Extract voting information
votes = extract_votes(html)
# Add voting information to the proposal
proposal['votes'] = votes
# Calculate total votes and percentages
total_votes = votes['approve']['count'] + votes['oppose']['count'] + votes['abstain']['count']
if total_votes > 0:
proposal['total_votes'] = total_votes
proposal['approve_percentage'] = round((votes['approve']['count'] / total_votes) * 100, 1)
proposal['oppose_percentage'] = round((votes['oppose']['count'] / total_votes) * 100, 1)
proposal['abstain_percentage'] = round((votes['abstain']['count'] / total_votes) * 100, 1)
else:
proposal['total_votes'] = 0
proposal['approve_percentage'] = 0
proposal['oppose_percentage'] = 0
proposal['abstain_percentage'] = 0
# Extract proposer from the page
soup = BeautifulSoup(html, 'html.parser')
content = soup.select_one('#mw-content-text')
if content:
# Look for table rows with "Proposed by:" in the header cell
for row in content.select('tr'):
cells = row.select('th, td')
if len(cells) >= 2:
header_text = cells[0].get_text().strip().lower()
if "proposed by" in header_text:
user_link = cells[1].select_one('a[href*="/wiki/User:"]')
if user_link:
href = user_link.get('href', '')
title = user_link.get('title', '')
# Try to get username from title attribute first
if title and title.startswith('User:'):
proposal['proposer'] = title[5:] # Remove 'User:' prefix
# Otherwise try to extract from href
elif href:
href_match = re.search(r'/wiki/User:([^/]+)', href)
if href_match:
proposal['proposer'] = href_match.group(1)
# If still no proposer, use the link text
if 'proposer' not in proposal and user_link.get_text():
proposal['proposer'] = user_link.get_text().strip()
# Add a delay to avoid overloading the server
time.sleep(1)
proposals.append(proposal)
logger.info(f"Found {len(proposals)} voting proposals")
return proposals

View file

@ -2389,7 +2389,596 @@
],
"common": []
},
"proposed_translation": " Voici la traduction du texte anglais vers le français :\n\nType Description\nType de relation. Groupe : propriétés Utilisés sur ces éléments\nValeurs documentées : 20\nVoir aussi * :type =*\nStatut : de facto type = *\nPlus de détails à la page info des balises\nTools pour cette balise taginfo · AD · AT · BR · BY · CH · CN · CZ · DE · DK · FI · FR · GB · GR · HU · IN · IR · IT · LI · LU · JP · KP · KR · NL · PL · PT · RU · ES · AR · MX · CO · BO · CL · EC · PY · PE · UY · VE · TW · UA · US · VN\noverpass-turbo OSM Tag History\ntype =* sur un objet relation spécifie son type et les interactions entre ses membres. Les types de relations établis et proposés sont listés ci-dessous. type a été également occasionnellement utilisé comme balise supplémentaire pour spécifier une \"variante\" d'une catégorie de fonctionnalité sur les voies et points. Cette approche est en conflit avec son utilisation dans les relations et devrait être évitée . De plus, pour les éléments possédant plusieurs balises, il n'est pas clair à quelle balise le type correspond. Au lieu de cela, utiliser une approche basée sur un suffixe ou une sous-balise, comme décrit dans *:type =* .\nContenu\n1 Relations établies\n2 Relations peu utilisées\n3 Utilisations proposées\n3.1 Junctions, intersections, grade separated crossings, and embankments\n3.2 Area hierarchies et autres relations pour les zones\n3.3 Adressage\n3.4 Autres\n4 Possibles erreurs de balises\n5 Qualité Assurance\n6 Voir aussi\nRelations établies\nType Statut Membres Commentaires Statistics Image\nmultipolygon de facto ( )\nZones où l'enveloppe se compose de plusieurs voies, ou qui possèdent des trous.\nroute de facto\nUne route établie (généralement signalée) sur une voie\nvaleurs documentées : 10\nvoir aussi *:route =*\nturn_restriction =*\nOpposition à la balise\n3.5 Système de signalisation\n\nNota bene : Il y a plusieurs erreurs dans le texte original anglais, ainsi que des erreurs de traduction dans la version française. C'est pourquoi j'ai fourni une traduction corrigée ici."
"proposed_translation": " Voici la traduction du texte anglais vers le français :\n\nType Description\nType de relation. Groupe : propriétés Utilisés sur ces éléments\nValeurs documentées : 20\nVoir aussi * :type =*\nStatut : de facto type = *\nPlus de détails à la page info des balises\nTools pour cette balise taginfo · AD · AT · BR · BY · CH · CN · CZ · DE · DK · FI · FR · GB · GR · HU · IN · IR · IT · LI · LU · JP · KP · KR · NL · PL · PT · RU · ES · AR · MX · CO · BO · CL · EC · PY · PE · UY · VE · TW · UA · US · VN\noverpass-turbo OSM Tag History\ntype =* sur un objet relation spécifie son type et les interactions entre ses membres. Les types de relations établis et proposés sont listés ci-dessous. type a été également occasionnellement utilisé comme balise supplémentaire pour spécifier une \"variante\" d'une catégorie de fonctionnalité sur les voies et points. Cette approche est en conflit avec son utilisation dans les relations et devrait être évitée . De plus, pour les éléments possédant plusieurs balises, il n'est pas clair à quelle balise le type correspond. Au lieu de cela, utiliser une approche basée sur un suffixe ou une sous-balise, comme décrit dans *:type =* .\nContenu\n1 Relations établies\n2 Relations peu utilisées\n3 Utilisations proposées\n3.1 Junctions, intersections, grade separated crossings, and embankments\n3.2 Area hierarchies et autres relations pour les zones\n3.3 Adressage\n3.4 Autres\n4 Possibles erreurs de balises\n5 Qualité Assurance\n6 Voir aussi\nRelations établies\nType Statut Membres Commentaires Statistics Image\nmultipolygon de facto ( )\nZones où l'enveloppe se compose de plusieurs voies, ou qui possèdent des trous.\nroute de facto\nUne route établie (généralement signalée) sur une voie\nvaleurs documentées : 10\nvoir aussi *:route =*\nturn_restriction =*\nOpposition à la balise\n3.5 Système de signalisation\n\nNota bene : Il y a plusieurs erreurs dans le texte original anglais, ainsi que des erreurs de traduction dans la version française. C'est pourquoi j'ai fourni une traduction corrigée ici.",
"grammar_suggestions": [
{
"paragraph": 2,
"start": 2043,
"end": 2045,
"type": "typo",
"message": "Pas despace avant ce signe.",
"suggestions": [
")"
],
"context": ""
},
{
"paragraph": 2,
"start": 2326,
"end": 2328,
"type": "typo",
"message": "Pas despace avant ce signe.",
"suggestions": [
")"
],
"context": ""
},
{
"paragraph": 2,
"start": 496,
"end": 498,
"type": "typo",
"message": "Pas despace avant un point.",
"suggestions": [
"."
],
"context": ""
},
{
"paragraph": 2,
"start": 1429,
"end": 1431,
"type": "typo",
"message": "Pas despace avant un point.",
"suggestions": [
"."
],
"context": ""
},
{
"paragraph": 2,
"start": 2673,
"end": 2675,
"type": "typo",
"message": "Pas despace avant un point.",
"suggestions": [
"."
],
"context": ""
},
{
"paragraph": 2,
"start": 1846,
"end": 1848,
"type": "typo",
"message": "Pas despace avant une virgule.",
"suggestions": [
","
],
"context": ""
},
{
"paragraph": 2,
"start": 2579,
"end": 2581,
"type": "typo",
"message": "Pas despace avant une virgule.",
"suggestions": [
","
],
"context": ""
},
{
"paragraph": 2,
"start": 2690,
"end": 2693,
"type": "typo",
"message": "Guillemets isolés.",
"suggestions": [
" « ",
" » ",
" “",
"” "
],
"context": ""
},
{
"paragraph": 2,
"start": 2767,
"end": 2769,
"type": "typo",
"message": "Guillemets fermants.",
"suggestions": [
" »",
"”"
],
"context": ""
},
{
"paragraph": 2,
"start": 410,
"end": 412,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"L"
],
"context": ""
},
{
"paragraph": 2,
"start": 482,
"end": 484,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"d"
],
"context": ""
},
{
"paragraph": 2,
"start": 532,
"end": 534,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"s"
],
"context": ""
},
{
"paragraph": 2,
"start": 555,
"end": 557,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"l"
],
"context": ""
},
{
"paragraph": 2,
"start": 579,
"end": 581,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"d"
],
"context": ""
},
{
"paragraph": 2,
"start": 1712,
"end": 1714,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"l"
],
"context": ""
},
{
"paragraph": 2,
"start": 1870,
"end": 1872,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"d"
],
"context": ""
},
{
"paragraph": 2,
"start": 1992,
"end": 1994,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"l"
],
"context": ""
},
{
"paragraph": 2,
"start": 2464,
"end": 2466,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"d"
],
"context": ""
},
{
"paragraph": 2,
"start": 2509,
"end": 2511,
"type": "apos",
"message": "Apostrophe typographique.",
"suggestions": [
"l"
],
"context": ""
},
{
"paragraph": 2,
"start": 1723,
"end": 1727,
"type": "maj",
"message": "Après un point, une majuscule est généralement requise.",
"suggestions": [
"Type"
],
"context": ""
},
{
"paragraph": 2,
"start": 1292,
"end": 1304,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" multipolygon"
],
"context": ""
},
{
"paragraph": 2,
"start": 1334,
"end": 1339,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" route"
],
"context": ""
},
{
"paragraph": 2,
"start": 1372,
"end": 1380,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" building"
],
"context": ""
},
{
"paragraph": 2,
"start": 2484,
"end": 2490,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" source"
],
"context": ""
},
{
"paragraph": 2,
"start": 2570,
"end": 2579,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" Attributs"
],
"context": ""
},
{
"paragraph": 2,
"start": 2649,
"end": 2657,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" Éléments"
],
"context": ""
},
{
"paragraph": 2,
"start": 2749,
"end": 2753,
"type": "typo",
"message": "Il manque un espace.",
"suggestions": [
" type"
],
"context": ""
},
{
"paragraph": 2,
"start": 51,
"end": 52,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 96,
"end": 98,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 617,
"end": 619,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 644,
"end": 646,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 678,
"end": 680,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 698,
"end": 700,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 780,
"end": 782,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 817,
"end": 819,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 844,
"end": 846,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 859,
"end": 861,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 942,
"end": 944,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 962,
"end": 964,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 1275,
"end": 1277,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 1317,
"end": 1319,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 1355,
"end": 1357,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 1666,
"end": 1668,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 1731,
"end": 1733,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 1946,
"end": 1948,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 2136,
"end": 2138,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 2158,
"end": 2160,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 2438,
"end": 2440,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 2498,
"end": 2500,
"type": "nbsp",
"message": "Il manque un espace insécable.",
"suggestions": [
" :"
],
"context": ""
},
{
"paragraph": 2,
"start": 2760,
"end": 2767,
"type": "num",
"message": "Formatage des grands nombres.",
"suggestions": [
"2 060 502"
],
"context": ""
},
{
"paragraph": 2,
"start": 64,
"end": 71,
"type": "gn",
"message": "Accord de genre erroné avec « Propriétés ».",
"suggestions": [
"Utilisée"
],
"context": ""
},
{
"paragraph": 2,
"start": 53,
"end": 63,
"type": "gn",
"message": "Accord de nombre erroné avec « Utilisé ».",
"suggestions": [
"Propriété"
],
"context": ""
},
{
"paragraph": 2,
"start": 356,
"end": 358,
"type": "typo",
"message": "Nombre ordinal romain singulier. Exemples : IIᵉ, IIIᵉ, IVᵉ…",
"suggestions": [
"Vᵉ"
],
"context": ""
}
]
},
{
"key": "surface",

View file

@ -1,12 +1,6 @@
{
"last_updated": "2025-08-22T17:17:53.415750",
"last_updated": "2025-08-31T11:26:56.762197",
"voting_proposals": [
{
"title": "Proposal:Man made=ceremonial gate",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Man_made%3Dceremonial_gate",
"status": "Voting",
"type": "voting"
},
{
"title": "Proposal:Developer",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Developer",
@ -14,5 +8,13 @@
"type": "voting"
}
],
"recent_proposals": []
"recent_proposals": [
{
"title": "Proposal:Pole types for public transportation",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Pole_types_for_public_transportation",
"last_modified": "",
"modified_by": "NFarras",
"type": "recent"
}
]
}

View file

@ -0,0 +1,381 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
suggest_grammar_improvements.py
This script reads the outdated_pages.json file, selects a wiki page (by default the first one),
and uses grammalecte to check the grammar of the French page content.
The grammar suggestions are saved in the "grammar_suggestions" property of the JSON file.
The script is compatible with different versions of the grammalecte API:
- For newer versions where GrammarChecker is directly in the grammalecte module
- For older versions where GrammarChecker is in the grammalecte.fr module
Usage:
python suggest_grammar_improvements.py [--page KEY]
Options:
--page KEY Specify the key of the page to check (default: first page in the file)
Output:
- Updated outdated_pages.json file with grammar suggestions
"""
import json
import argparse
import logging
import requests
import os
import sys
import subprocess
from bs4 import BeautifulSoup
try:
import grammalecte
import grammalecte.text as txt
# Check if GrammarChecker is available directly in the grammalecte module (newer versions)
try:
from grammalecte import GrammarChecker
GRAMMALECTE_DIRECT_API = True
except ImportError:
# Try the older API structure with fr submodule
try:
import grammalecte.fr as gr_fr
GRAMMALECTE_DIRECT_API = False
except ImportError:
# Neither API is available
raise ImportError("Could not import GrammarChecker from grammalecte")
GRAMMALECTE_AVAILABLE = True
except ImportError:
GRAMMALECTE_AVAILABLE = False
GRAMMALECTE_DIRECT_API = False
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTDATED_PAGES_FILE = "outdated_pages.json"
def load_outdated_pages():
"""
Load the outdated pages from the JSON file
Returns:
list: List of dictionaries containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
pages = json.load(f)
logger.info(f"Successfully loaded {len(pages)} pages from {OUTDATED_PAGES_FILE}")
return pages
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading pages from {OUTDATED_PAGES_FILE}: {e}")
return []
def save_to_json(data, filename):
"""
Save data to a JSON file
Args:
data: Data to save
filename (str): Name of the file
"""
try:
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Data saved to {filename}")
except IOError as e:
logger.error(f"Error saving data to {filename}: {e}")
def fetch_wiki_page_content(url):
"""
Fetch the content of a wiki page
Args:
url (str): URL of the wiki page
Returns:
str: Content of the wiki page
"""
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Get the main content
content = soup.select_one('#mw-content-text')
if content:
# Remove script and style elements
for script in content.select('script, style'):
script.extract()
# Remove .languages elements
for languages_elem in content.select('.languages'):
languages_elem.extract()
# Get text
text = content.get_text(separator=' ', strip=True)
return text
else:
logger.warning(f"Could not find content in page: {url}")
return ""
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching wiki page content: {e}")
return ""
def check_grammar_with_grammalecte(text):
"""
Check grammar using grammalecte
Args:
text (str): Text to check
Returns:
list: List of grammar suggestions
"""
if not GRAMMALECTE_AVAILABLE:
logger.error("Grammalecte is not installed. Please install it with: pip install grammalecte")
return []
try:
logger.info("Checking grammar with grammalecte")
# Initialize grammalecte based on which API version is available
if GRAMMALECTE_DIRECT_API:
# New API: GrammarChecker is directly in grammalecte module
logger.info("Using direct GrammarChecker API")
gce = GrammarChecker("fr")
# Split text into paragraphs
paragraphs = txt.getParagraph(text)
# Check grammar for each paragraph
suggestions = []
for i, paragraph in enumerate(paragraphs):
if paragraph.strip():
# Use getParagraphErrors method
errors = gce.getParagraphErrors(paragraph)
for error in errors:
# Filter out spelling errors if needed
if "sType" in error and error["sType"] != "WORD" and error.get("bError", True):
suggestion = {
"paragraph": i + 1,
"start": error.get("nStart", 0),
"end": error.get("nEnd", 0),
"type": error.get("sType", ""),
"message": error.get("sMessage", ""),
"suggestions": error.get("aSuggestions", []),
"context": paragraph[max(0, error.get("nStart", 0) - 20):min(len(paragraph), error.get("nEnd", 0) + 20)]
}
suggestions.append(suggestion)
else:
# Old API: GrammarChecker is in grammalecte.fr module
logger.info("Using legacy grammalecte.fr.GrammarChecker API")
gce = gr_fr.GrammarChecker("fr")
# Split text into paragraphs
paragraphs = txt.getParagraph(text)
# Check grammar for each paragraph
suggestions = []
for i, paragraph in enumerate(paragraphs):
if paragraph.strip():
# Use parse method for older API
for error in gce.parse(paragraph, "FR", False):
if error["sType"] != "WORD" and error["bError"]:
suggestion = {
"paragraph": i + 1,
"start": error["nStart"],
"end": error["nEnd"],
"type": error["sType"],
"message": error["sMessage"],
"suggestions": error.get("aSuggestions", []),
"context": paragraph[max(0, error["nStart"] - 20):min(len(paragraph), error["nEnd"] + 20)]
}
suggestions.append(suggestion)
logger.info(f"Found {len(suggestions)} grammar suggestions")
return suggestions
except Exception as e:
logger.error(f"Error checking grammar with grammalecte: {e}")
return []
def check_grammar_with_cli(text):
"""
Check grammar using grammalecte-cli command
Args:
text (str): Text to check
Returns:
list: List of grammar suggestions
"""
try:
logger.info("Checking grammar with grammalecte-cli")
# Create a temporary file with the text
temp_file = "temp_text_for_grammar_check.txt"
with open(temp_file, 'w', encoding='utf-8') as f:
f.write(text)
# Run grammalecte-cli
cmd = ["grammalecte-cli", "--json", "--file", temp_file]
result = subprocess.run(cmd, capture_output=True, text=True, encoding='utf-8')
# Remove temporary file
if os.path.exists(temp_file):
os.remove(temp_file)
if result.returncode != 0:
logger.error(f"Error running grammalecte-cli: {result.stderr}")
return []
# Parse JSON output
output = json.loads(result.stdout)
# Extract grammar suggestions
suggestions = []
for paragraph_data in output.get("data", []):
paragraph_index = paragraph_data.get("iParagraph", 0)
for error in paragraph_data.get("lGrammarErrors", []):
suggestion = {
"paragraph": paragraph_index + 1,
"start": error.get("nStart", 0),
"end": error.get("nEnd", 0),
"type": error.get("sType", ""),
"message": error.get("sMessage", ""),
"suggestions": error.get("aSuggestions", []),
"context": error.get("sContext", "")
}
suggestions.append(suggestion)
logger.info(f"Found {len(suggestions)} grammar suggestions")
return suggestions
except Exception as e:
logger.error(f"Error checking grammar with grammalecte-cli: {e}")
return []
def check_grammar(text):
"""
Check grammar using available method (Python library or CLI)
Args:
text (str): Text to check
Returns:
list: List of grammar suggestions
"""
# Try using the Python library first
if GRAMMALECTE_AVAILABLE:
return check_grammar_with_grammalecte(text)
# Fall back to CLI if available
try:
# Check if grammalecte-cli is available
subprocess.run(["grammalecte-cli", "--help"], capture_output=True)
return check_grammar_with_cli(text)
except (subprocess.SubprocessError, FileNotFoundError):
logger.error("Neither grammalecte Python package nor grammalecte-cli is available.")
logger.error("Please install grammalecte with: pip install grammalecte")
return []
def select_page_for_grammar_check(pages, key=None):
"""
Select a page for grammar checking
Args:
pages (list): List of dictionaries containing page information
key (str): Key of the page to select (if None, select the first page)
Returns:
dict: Selected page or None if no suitable page found
"""
if not pages:
logger.warning("No pages found that need grammar checking")
return None
if key:
# Find the page with the specified key
for page in pages:
if page.get('key') == key:
# Check if the page has a French version
if page.get('fr_page') is None:
logger.warning(f"Page with key '{key}' does not have a French version")
return None
logger.info(f"Selected page for key '{key}' for grammar checking")
return page
logger.warning(f"No page found with key '{key}'")
return None
else:
# Select the first page that has a French version
for page in pages:
if page.get('fr_page') is not None:
logger.info(f"Selected first page with French version (key '{page['key']}') for grammar checking")
return page
logger.warning("No pages found with French versions")
return None
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Suggest grammar improvements for an OSM wiki page using grammalecte")
parser.add_argument("--page", help="Key of the page to check (default: first page with a French version)")
args = parser.parse_args()
logger.info("Starting suggest_grammar_improvements.py")
# Load pages
pages = load_outdated_pages()
if not pages:
logger.error("No pages found. Run wiki_compare.py first.")
sys.exit(1)
# Select a page for grammar checking
selected_page = select_page_for_grammar_check(pages, args.page)
if not selected_page:
logger.error("Could not select a page for grammar checking.")
sys.exit(1)
# Get the French page URL
fr_url = selected_page.get('fr_page', {}).get('url')
if not fr_url:
logger.error(f"No French page URL found for key '{selected_page['key']}'")
sys.exit(1)
# Fetch the content of the French page
logger.info(f"Fetching content from {fr_url}")
content = fetch_wiki_page_content(fr_url)
if not content:
logger.error(f"Could not fetch content from {fr_url}")
sys.exit(1)
# Check grammar
logger.info(f"Checking grammar for key '{selected_page['key']}'")
suggestions = check_grammar(content)
if not suggestions:
logger.warning("No grammar suggestions found or grammar checker not available")
# Save the grammar suggestions in the JSON file
logger.info(f"Saving grammar suggestions for key '{selected_page['key']}'")
selected_page['grammar_suggestions'] = suggestions
# Save the updated data back to the file
save_to_json(pages, OUTDATED_PAGES_FILE)
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -1,5 +1,5 @@
{
"last_updated": "2025-08-22T15:22:37.234265",
"last_updated": "2025-08-31T11:28:25.264096",
"untranslated_pages": [
{
"title": "FR:2017 Ouragans Irma et Maria",
@ -517,12 +517,6 @@
"url": "https://wiki.openstreetmap.org/wiki/FR:Compl%C3%A8te_Tes_Commerces/Avanc%C3%A9",
"has_translation": false
},
{
"title": "FR:Complète Tes Commerces/Debutant",
"key": "Complète Tes Commerces/Debutant",
"url": "https://wiki.openstreetmap.org/wiki/FR:Compl%C3%A8te_Tes_Commerces/Debutant",
"has_translation": false
},
{
"title": "FR:Complète Tes Commerces/Débutant",
"key": "Complète Tes Commerces/Débutant",
@ -942,6 +936,12 @@
"key": "France/Map/Terres australes et antarctiques françaises",
"url": "https://wiki.openstreetmap.org/wiki/FR:France/Map/Terres_australes_et_antarctiques_fran%C3%A7aises",
"has_translation": false
},
{
"title": "FR:France/Map/Wallis-et-Futuna",
"key": "France/Map/Wallis-et-Futuna",
"url": "https://wiki.openstreetmap.org/wiki/FR:France/Map/Wallis-et-Futuna",
"has_translation": false
}
]
}