recup sources

This commit is contained in:
Tykayn 2025-09-01 18:28:23 +02:00 committed by tykayn
parent 86622a19ea
commit 65fe2a35f9
155 changed files with 50969 additions and 0 deletions

3
wiki_compare/.gitignore vendored Normal file
View file

@ -0,0 +1,3 @@
*.json
.env
*.png

103
wiki_compare/CHANGES.md Normal file
View file

@ -0,0 +1,103 @@
# Changements implémentés
Ce document résume les changements et nouvelles fonctionnalités implémentés dans le cadre de la mise à jour du système de gestion des pages wiki OSM.
## 1. Suivi des changements récents du wiki OSM
### Fonctionnalités ajoutées
- Création d'un script `fetch_recent_changes.py` qui récupère les changements récents dans l'espace de noms français du wiki OSM
- Ajout d'une nouvelle route `/wiki/recent-changes` dans le contrôleur WikiController
- Création d'un template `wiki_recent_changes.html.twig` pour afficher les changements récents
- Mise à jour de la navigation pour inclure un lien vers la page des changements récents
### Utilisation
- Les changements récents sont automatiquement récupérés toutes les heures
- La page affiche la liste des pages modifiées récemment avec des liens vers ces pages
## 2. Validation de la hiérarchie des titres
### Fonctionnalités ajoutées
- Implémentation d'une logique de détection des hiérarchies de titres incorrectes (par exemple, h4 directement sous h2 sans h3 intermédiaire)
- Ajout d'indicateurs visuels (badges) pour signaler les hiérarchies incorrectes dans les listes de sections
- Mise à jour du template `wiki_compare.html.twig` pour afficher ces indicateurs
### Utilisation
- Les hiérarchies incorrectes sont automatiquement détectées lors de la comparaison des pages wiki
- Un badge rouge avec un point d'exclamation est affiché à côté des titres ayant une hiérarchie incorrecte
## 3. Vérification des groupes locaux
### Fonctionnalités ajoutées
- Mise à jour du script `fetch_osm_fr_groups.py` pour récupérer les données des groupes locaux depuis Framacalc
- Ajout d'une fonctionnalité de vérification de l'existence d'une page wiki pour chaque groupe
- Mise à jour du template `wiki_osm_fr_groups.html.twig` pour afficher les résultats de vérification
- Ajout de filtres pour faciliter la navigation parmi les groupes
### Utilisation
- Les groupes sont affichés avec des badges indiquant leur source (wiki ou Framacalc)
- Les groupes sans page wiki sont mis en évidence avec un badge rouge
- Les filtres permettent de voir uniquement les groupes d'une certaine catégorie (tous, wiki, Framacalc, avec page wiki, sans page wiki)
## Limitations connues
1. **Accès aux données externes** : Les scripts peuvent rencontrer des difficultés pour accéder aux données externes (wiki OSM, Framacalc) en fonction de l'environnement d'exécution.
2. **Détection des hiérarchies** : La détection des hiérarchies incorrectes se base uniquement sur les niveaux des titres et ne prend pas en compte le contenu ou la sémantique.
3. **Correspondance des groupes** : La correspondance entre les groupes Framacalc et les pages wiki se fait par une comparaison approximative des noms, ce qui peut parfois donner des résultats imprécis.
## Maintenance future
### Scripts Python
- Les scripts Python sont situés dans le répertoire `wiki_compare/`
- Ils peuvent être exécutés manuellement ou via des tâches cron
- L'option `--dry-run` permet de tester les scripts sans modifier les fichiers
- L'option `--force` permet de forcer la mise à jour même si le cache est récent
### Templates Twig
- Les templates sont situés dans le répertoire `templates/admin/`
- `wiki_recent_changes.html.twig` : Affichage des changements récents
- `wiki_compare.html.twig` : Comparaison des pages wiki avec validation de hiérarchie
- `wiki_osm_fr_groups.html.twig` : Affichage des groupes locaux avec vérification des pages wiki
### Contrôleur
- Le contrôleur `WikiController.php` contient toutes les routes et la logique de traitement
- La méthode `detectHeadingHierarchyErrors()` peut être ajustée pour modifier les règles de validation des hiérarchies
- Les méthodes de rafraîchissement des données (`refreshRecentChangesData()`, etc.) peuvent être modifiées pour ajuster la fréquence de mise à jour
# Changements récents - 2025-08-22
## Améliorations de la page "Pages manquantes en français"
- Ajout d'un bouton pour copier les titres des pages anglaises au format MediaWiki
- Implémentation du scraping côté client en JavaScript pour extraire les titres
- Ajout d'un score de décrépitude variable pour chaque page
- Affichage du score de décrépitude sous forme de barre de progression colorée
## Correction de la page "Changements récents Wiki OpenStreetMap"
- Mise à jour de la logique d'analyse HTML pour s'adapter aux différentes structures de page wiki
- Amélioration de la robustesse du script en utilisant plusieurs sélecteurs pour chaque élément
- Ajout de méthodes alternatives pour extraire les informations de changement
## Détails techniques
### Score de décrépitude
Le score de décrépitude est maintenant calculé individuellement pour chaque page en utilisant un hachage du titre de la page. Cela garantit que:
- Chaque page a un score différent
- Les pages en anglais ont généralement un score plus élevé (priorité plus haute)
- Les scores sont cohérents entre les exécutions du script
### Copie des titres au format MediaWiki
Le bouton "Copier les titres au format MediaWiki" permet de:
- Extraire tous les titres des pages anglaises de la section
- Les formater au format MediaWiki (`* [[Titre]]`)
- Les copier dans le presse-papier pour une utilisation facile
### Amélioration de la détection des changements récents
Le script de détection des changements récents a été amélioré pour:
- Essayer plusieurs sélecteurs HTML pour s'adapter aux changements de structure du wiki
- Extraire les informations de changement de manière plus robuste
- Gérer différentes versions de la page de changements récents

577
wiki_compare/README.md Normal file
View file

@ -0,0 +1,577 @@
# OSM Wiki Compare
Ce projet contient des scripts pour analyser les pages wiki d'OpenStreetMap, identifier celles qui ont besoin de mises à
jour ou de traductions, et publier des suggestions sur Mastodon pour encourager la communauté à contribuer.
## Vue d'ensemble
Le projet comprend onze scripts principaux :
1. **wiki_compare.py** : Récupère les 50 clés OSM les plus utilisées, compare leurs pages wiki en anglais et en
français, et identifie celles qui ont besoin de mises à jour.
2. **post_outdated_page.py** : Sélectionne aléatoirement une page wiki française qui n'est pas à jour et publie un
message sur Mastodon pour suggérer sa mise à jour.
3. **suggest_translation.py** : Identifie les pages wiki anglaises qui n'ont pas de traduction française et publie une
suggestion de traduction sur Mastodon.
4. **propose_translation.py** : Sélectionne une page wiki (par défaut la première) et utilise Ollama avec le modèle
"mistral:7b" pour proposer une traduction, qui est sauvegardée dans le fichier outdated_pages.json.
5. **suggest_grammar_improvements.py** : Sélectionne une page wiki française (par défaut la première) et utilise grammalecte
pour vérifier la grammaire et proposer des améliorations, qui sont sauvegardées dans le fichier outdated_pages.json.
6. **detect_suspicious_deletions.py** : Analyse les changements récents du wiki OSM pour détecter les suppressions
suspectes (plus de 20 caractères) et les enregistre dans un fichier JSON pour affichage sur le site web.
7. **fetch_proposals.py** : Récupère les propositions de tags OSM en cours de vote et les propositions récemment modifiées,
et les enregistre dans un fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure
pour éviter des requêtes trop fréquentes au serveur wiki.
8. **find_untranslated_french_pages.py** : Identifie les pages wiki françaises qui n'ont pas de traduction en anglais
et les enregistre dans un fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure.
9. **find_pages_unavailable_in_french.py** : Scrape la catégorie des pages non disponibles en français, gère la pagination
pour récupérer toutes les pages, les groupe par préfixe de langue et priorise les pages commençant par "En:". Les données
sont mises en cache pendant une heure.
10. **fetch_osm_fr_groups.py** : Récupère les informations sur les groupes de travail et les groupes locaux d'OSM-FR
depuis la section #Pages_des_groupes_locaux et les enregistre dans un fichier JSON pour affichage sur le site web.
Les données sont mises en cache pendant une heure.
11. **fetch_recent_changes.py** : Récupère les changements récents du wiki OSM pour l'espace de noms français, détecte les pages
nouvellement créées qui étaient auparavant dans la liste des pages non disponibles en français, et les enregistre dans un
fichier JSON pour affichage sur le site web. Les données sont mises en cache pendant une heure.
## Installation
### Prérequis
- Python 3.6 ou supérieur
- Pip (gestionnaire de paquets Python)
### Dépendances
Installez les dépendances requises :
```bash
pip install requests beautifulsoup4
```
Pour utiliser le script propose_translation.py, vous devez également installer Ollama :
1. Installez Ollama en suivant les instructions sur [ollama.ai](https://ollama.ai/)
2. Téléchargez le modèle "mistral:7b" :
```bash
ollama pull mistral:7b
```
Pour utiliser le script suggest_grammar_improvements.py, vous devez installer grammalecte :
```bash
pip install grammalecte
```
## Configuration
### Mastodon API
Pour publier sur Mastodon, vous devez :
1. Créer un compte sur une instance Mastodon
2. Créer une application dans les paramètres de votre compte pour obtenir un jeton d'accès
3. Configurer les scripts avec votre instance et votre jeton d'accès
Modifiez les constantes suivantes dans les scripts `post_outdated_page.py` et `suggest_translation.py` :
```python
MASTODON_API_URL = "https://mastodon.instance/api/v1/statuses" # Remplacez par votre instance
```
### Variables d'environnement
Définissez la variable d'environnement suivante pour l'authentification Mastodon :
```bash
export MASTODON_ACCESS_TOKEN="votre_jeton_d_acces"
```
## Utilisation
### Analyser les pages wiki
Pour analyser les pages wiki et générer les fichiers de données :
```bash
./wiki_compare.py
```
Cela produira :
- `top_keys.json` : Les 10 clés OSM les plus utilisées
- `wiki_pages.csv` : Informations sur chaque page wiki
- `outdated_pages.json` : Pages qui ont besoin de mises à jour
- Une sortie console listant les 10 pages wiki qui ont besoin de mises à jour
### Publier une suggestion de mise à jour
Pour sélectionner aléatoirement une page française qui n'est pas à jour et publier une suggestion sur Mastodon :
```bash
./post_outdated_page.py
```
Pour simuler la publication sans réellement poster sur Mastodon (mode test) :
```bash
./post_outdated_page.py --dry-run
```
### Suggérer une traduction
Pour identifier une page anglaise sans traduction française et publier une suggestion sur Mastodon :
```bash
./suggest_translation.py
```
Pour simuler la publication sans réellement poster sur Mastodon (mode test) :
```bash
./suggest_translation.py --dry-run
```
### Proposer une traduction avec Ollama
Pour sélectionner une page wiki (par défaut la première du fichier outdated_pages.json) et générer une proposition de traduction avec Ollama :
```bash
./propose_translation.py
```
Pour traduire une page spécifique en utilisant sa clé :
```bash
./propose_translation.py --page type
```
Note : Ce script nécessite que Ollama soit installé et exécuté localement avec le modèle "mistral:7b" disponible. Pour installer Ollama, suivez les instructions sur [ollama.ai](https://ollama.ai/). Pour télécharger le modèle "mistral:7b", exécutez :
```bash
ollama pull mistral:7b
```
Le script enregistre la traduction proposée dans la propriété "proposed_translation" de l'entrée correspondante dans le fichier outdated_pages.json.
### Suggérer des améliorations grammaticales avec grammalecte
Pour sélectionner une page wiki française (par défaut la première avec une version française) et générer des suggestions d'amélioration grammaticale avec grammalecte :
```bash
./suggest_grammar_improvements.py
```
Pour vérifier une page spécifique en utilisant sa clé :
```bash
./suggest_grammar_improvements.py --page type
```
Note : Ce script nécessite que grammalecte soit installé. Pour l'installer, exécutez :
```bash
pip install grammalecte
```
Le script enregistre les suggestions grammaticales dans la propriété "grammar_suggestions" de l'entrée correspondante dans le fichier outdated_pages.json. Ces suggestions sont ensuite utilisées par Symfony dans le template pour afficher des corrections possibles sur la version française de la page dans une section dédiée.
### Détecter les suppressions suspectes
Pour analyser les changements récents du wiki OSM et détecter les suppressions suspectes :
```bash
./detect_suspicious_deletions.py
```
Pour afficher les suppressions détectées sans les enregistrer dans un fichier (mode test) :
```bash
./detect_suspicious_deletions.py --dry-run
```
### Récupérer les propositions de tags
Pour récupérer les propositions de tags OSM en cours de vote et récemment modifiées :
```bash
./fetch_proposals.py
```
Pour forcer la mise à jour des données même si le cache est encore frais :
```bash
./fetch_proposals.py --force
```
Pour afficher les propositions sans les enregistrer dans un fichier (mode test) :
```bash
./fetch_proposals.py --dry-run
```
### Trouver les pages françaises sans traduction anglaise
Pour identifier les pages wiki françaises qui n'ont pas de traduction en anglais :
```bash
./find_untranslated_french_pages.py
```
Pour forcer la mise à jour des données même si le cache est encore frais :
```bash
./find_untranslated_french_pages.py --force
```
Pour afficher les pages sans les enregistrer dans un fichier (mode test) :
```bash
./find_untranslated_french_pages.py --dry-run
```
### Trouver les pages non disponibles en français
Pour identifier les pages wiki qui n'ont pas de traduction française, groupées par langue d'origine :
```bash
./find_pages_unavailable_in_french.py
```
Pour forcer la mise à jour des données même si le cache est encore frais :
```bash
./find_pages_unavailable_in_french.py --force
```
Pour afficher les pages sans les enregistrer dans un fichier (mode test) :
```bash
./find_pages_unavailable_in_french.py --dry-run
```
### Récupérer les groupes OSM-FR
Pour récupérer les informations sur les groupes de travail et les groupes locaux d'OSM-FR :
```bash
./fetch_osm_fr_groups.py
```
Pour forcer la mise à jour des données même si le cache est encore frais :
```bash
./fetch_osm_fr_groups.py --force
```
Pour afficher les groupes sans les enregistrer dans un fichier (mode test) :
```bash
./fetch_osm_fr_groups.py --dry-run
```
## Automatisation
Vous pouvez automatiser l'exécution de ces scripts à l'aide de cron pour publier régulièrement des suggestions de mises
à jour et de traductions, ainsi que pour maintenir à jour les données affichées sur le site web.
Exemple de configuration cron pour publier des suggestions et mettre à jour les données :
```
# Publier des suggestions sur Mastodon
0 10 * * 1 cd /chemin/vers/wiki_compare && ./wiki_compare.py && ./post_outdated_page.py
0 10 * * 4 cd /chemin/vers/wiki_compare && ./wiki_compare.py && ./suggest_translation.py
# Mettre à jour les données pour le site web (toutes les 6 heures)
0 */6 * * * cd /chemin/vers/wiki_compare && ./detect_suspicious_deletions.py
0 */6 * * * cd /chemin/vers/wiki_compare && ./fetch_proposals.py
0 */6 * * * cd /chemin/vers/wiki_compare && ./find_untranslated_french_pages.py
0 */6 * * * cd /chemin/vers/wiki_compare && ./find_pages_unavailable_in_french.py
0 */6 * * * cd /chemin/vers/wiki_compare && ./fetch_osm_fr_groups.py
# Récupérer les changements récents et détecter les pages nouvellement créées (toutes les heures)
0 * * * * cd /chemin/vers/wiki_compare && ./fetch_recent_changes.py
```
Note : Les scripts de mise à jour des données pour le site web intègrent déjà une vérification de fraîcheur du cache (1 heure),
mais la configuration cron ci-dessus permet de s'assurer que les données sont régulièrement mises à jour même en cas de problème
temporaire avec les scripts.
## Structure des données
### top_keys.json
Contient les 10 clés OSM les plus utilisées avec leur nombre d'utilisations :
```json
[
{
"key": "building",
"count": 459876543
}
]
```
### wiki_pages.csv
Contient des informations sur chaque page wiki :
```
key,language,url,last_modified,sections,word_count
building,en,https://wiki.openstreetmap.org/wiki/Key:building,2023-05-15,12,3500
building,fr,https://wiki.openstreetmap.org/wiki/FR:Key:building,2022-01-10,10,2800
...
```
### outdated_pages.json
Contient des informations détaillées sur les pages qui ont besoin de mises à jour :
```json
[
{
"key": "building",
"reason": "French page outdated by 491 days",
"en_page": {},
"fr_page": {},
"date_diff": 491,
"word_diff": 700,
"section_diff": 2,
"priority": 250.5,
"proposed_translation": "Texte de la traduction proposée...",
"grammar_suggestions": [
{
"paragraph": 1,
"start": 45,
"end": 52,
"type": "ACCORD",
"message": "Accord avec le nom : « bâtiments » est masculin pluriel.",
"suggestions": ["grands"],
"context": "...les grandes bâtiments de la ville..."
},
{
"paragraph": 3,
"start": 120,
"end": 128,
"type": "CONJUGAISON",
"message": "Conjugaison erronée. Accord avec « ils ».",
"suggestions": ["peuvent"],
"context": "...les bâtiments peut être classés..."
}
]
},
{
"key": "amenity",
"reason": "French page missing",
"en_page": {},
"fr_page": null,
"date_diff": 0,
"word_diff": 4200,
"section_diff": 15,
"priority": 100
}
]
```
### suspicious_deletions.json
Contient des informations sur les suppressions suspectes détectées dans les changements récents du wiki OSM :
```json
{
"last_updated": "2025-08-22T15:03:03.616532",
"deletions": [
{
"page_title": "FR:Key:roof:shape",
"page_url": "https://wiki.openstreetmap.org/wiki/FR:Key:roof:shape",
"deletion_size": -286,
"timestamp": "22 août 2025 à 14:15",
"user": "RubenKelevra",
"comment": "Suppression de contenu obsolète"
},
{
"page_title": "FR:Key:sport",
"page_url": "https://wiki.openstreetmap.org/wiki/FR:Key:sport",
"deletion_size": -240,
"timestamp": "21 août 2025 à 09:30",
"user": "Computae",
"comment": "Mise à jour de la documentation"
}
]
}
```
### proposals.json
Contient des informations sur les propositions de tags OSM en cours de vote et récemment modifiées :
```json
{
"last_updated": "2025-08-22T15:09:49.905332",
"voting_proposals": [
{
"title": "Proposal:Man made=ceremonial gate",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Man_made%3Dceremonial_gate",
"status": "Voting",
"type": "voting"
},
{
"title": "Proposal:Developer",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Developer",
"status": "Voting",
"type": "voting"
}
],
"recent_proposals": [
{
"title": "Proposal:Landuse=brownfield",
"url": "https://wiki.openstreetmap.org/wiki/Proposal:Landuse=brownfield",
"last_modified": "22 août 2025 à 10:45",
"modified_by": "MapperUser",
"type": "recent"
}
]
}
```
### untranslated_french_pages.json
Contient des informations sur les pages wiki françaises qui n'ont pas de traduction en anglais :
```json
{
"last_updated": "2025-08-22T16:30:15.123456",
"untranslated_pages": [
{
"title": "FR:Key:building:colour",
"key": "Key:building:colour",
"url": "https://wiki.openstreetmap.org/wiki/FR:Key:building:colour",
"has_translation": false
},
{
"title": "FR:Tag:amenity=bicycle_repair_station",
"key": "Tag:amenity=bicycle_repair_station",
"url": "https://wiki.openstreetmap.org/wiki/FR:Tag:amenity=bicycle_repair_station",
"has_translation": false
}
]
}
```
### pages_unavailable_in_french.json
Contient des informations sur les pages wiki qui n'ont pas de traduction française, groupées par langue d'origine :
```json
{
"last_updated": "2025-08-22T17:15:45.123456",
"grouped_pages": {
"En": [
{
"title": "En:Key:building:colour",
"url": "https://wiki.openstreetmap.org/wiki/En:Key:building:colour",
"language_prefix": "En",
"is_english": true,
"priority": 1
}
],
"De": [
{
"title": "De:Tag:highway=residential",
"url": "https://wiki.openstreetmap.org/wiki/De:Tag:highway=residential",
"language_prefix": "De",
"is_english": false,
"priority": 0
}
],
"Other": [
{
"title": "Tag:amenity=bicycle_repair_station",
"url": "https://wiki.openstreetmap.org/wiki/Tag:amenity=bicycle_repair_station",
"language_prefix": "Other",
"is_english": false,
"priority": 0
}
]
},
"all_pages": [
{
"title": "En:Key:building:colour",
"url": "https://wiki.openstreetmap.org/wiki/En:Key:building:colour",
"language_prefix": "En",
"is_english": true,
"priority": 1
},
{
"title": "De:Tag:highway=residential",
"url": "https://wiki.openstreetmap.org/wiki/De:Tag:highway=residential",
"language_prefix": "De",
"is_english": false,
"priority": 0
},
{
"title": "Tag:amenity=bicycle_repair_station",
"url": "https://wiki.openstreetmap.org/wiki/Tag:amenity=bicycle_repair_station",
"language_prefix": "Other",
"is_english": false,
"priority": 0
}
]
}
```
### osm_fr_groups.json
Contient des informations sur les groupes de travail et les groupes locaux d'OSM-FR :
```json
{
"last_updated": "2025-08-22T16:45:30.789012",
"working_groups": [
{
"name": "Groupe Bâtiments",
"url": "https://wiki.openstreetmap.org/wiki/France/OSM-FR/Groupes_de_travail/B%C3%A2timents",
"description": "Groupe de travail sur la cartographie des bâtiments",
"category": "Cartographie",
"type": "working_group"
}
],
"local_groups": [
{
"name": "Groupe local de Paris",
"url": "https://wiki.openstreetmap.org/wiki/France/Paris",
"description": "Groupe local des contributeurs parisiens",
"type": "local_group"
}
],
"umap_url": "https://umap.openstreetmap.fr/fr/map/groupes-locaux-openstreetmap_152488"
}
```
## Dépannage
### Problèmes courants
1. **Erreur d'authentification Mastodon** : Vérifiez que la variable d'environnement `MASTODON_ACCESS_TOKEN` est
correctement définie et que le jeton est valide.
2. **Erreur de chargement des fichiers JSON** : Assurez-vous d'exécuter `wiki_compare.py` avant les autres scripts pour
générer les fichiers de données nécessaires.
3. **Aucune page à mettre à jour ou à traduire** : Il est possible que toutes les pages soient à jour ou traduites.
Essayez d'augmenter le nombre de clés analysées en modifiant la valeur `limit` dans la fonction `fetch_top_keys` de
`wiki_compare.py`.
### Journalisation
Tous les scripts utilisent le module `logging` pour enregistrer les informations d'exécution. Par défaut, les logs sont
affichés dans la console. Pour les rediriger vers un fichier, modifiez la configuration de logging dans chaque script.
## Contribution
Les contributions sont les bienvenues ! N'hésitez pas à ouvrir une issue ou une pull request pour améliorer ces scripts.
## Licence
Ce projet est sous licence MIT. Voir le fichier LICENSE pour plus de détails.

View file

@ -0,0 +1,252 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
import json
import logging
import argparse
import os
import re
from datetime import datetime
from urllib.parse import urlparse, parse_qs, urlencode
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# URL for recent changes in OSM Wiki (namespace 202 is for Tag pages)
RECENT_CHANGES_URL = "https://wiki.openstreetmap.org/w/index.php?hidebots=1&hidenewpages=1&hidecategorization=1&hideWikibase=1&hidelog=1&hidenewuserlog=1&namespace=202&limit=250&days=30&enhanced=1&title=Special:RecentChanges&urlversion=2"
# Threshold for suspicious deletions (percentage of total content)
DELETION_THRESHOLD_PERCENT = 5.0
# Base URL for OSM Wiki
WIKI_BASE_URL = "https://wiki.openstreetmap.org"
def fetch_recent_changes():
"""
Fetch the recent changes page from OSM Wiki
"""
logger.info(f"Fetching recent changes from {RECENT_CHANGES_URL}")
try:
response = requests.get(RECENT_CHANGES_URL)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching recent changes: {e}")
return None
def fetch_page_content(page_title):
"""
Fetch the content of a wiki page to count characters
"""
url = f"{WIKI_BASE_URL}/wiki/{page_title}"
logger.info(f"Fetching page content from {url}")
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching page content: {e}")
return None
def count_page_characters(html_content):
"""
Count the total number of characters in the wiki page content
"""
if not html_content:
return 0
soup = BeautifulSoup(html_content, 'html.parser')
# Find the main content div
content_div = soup.select_one('#mw-content-text')
if not content_div:
return 0
# Get all text content
text_content = content_div.get_text(strip=True)
# Count characters
char_count = len(text_content)
logger.info(f"Page has {char_count} characters")
return char_count
def generate_diff_url(page_title, oldid):
"""
Generate URL to view the diff of a specific revision
"""
return f"{WIKI_BASE_URL}/w/index.php?title={page_title}&diff=prev&oldid={oldid}"
def generate_history_url(page_title):
"""
Generate URL to view the history of a page
"""
return f"{WIKI_BASE_URL}/w/index.php?title={page_title}&action=history"
def load_existing_deletions():
"""
Load existing suspicious deletions from the JSON file
"""
output_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'suspicious_deletions.json')
existing_pages = set()
try:
if os.path.exists(output_file):
with open(output_file, 'r', encoding='utf-8') as f:
data = json.load(f)
if 'deletions' in data:
for deletion in data['deletions']:
if 'page_title' in deletion:
existing_pages.add(deletion['page_title'])
logger.info(f"Loaded {len(existing_pages)} existing pages from {output_file}")
else:
logger.info(f"No existing file found at {output_file}")
except Exception as e:
logger.error(f"Error loading existing deletions: {e}")
return existing_pages
def parse_suspicious_deletions(html_content):
"""
Parse the HTML content to find suspicious deletions
"""
if not html_content:
return []
# Load existing pages from the JSON file
existing_pages = load_existing_deletions()
soup = BeautifulSoup(html_content, 'html.parser')
suspicious_deletions = []
# Find all change list lines
change_lines = soup.select('.mw-changeslist .mw-changeslist-line')
logger.info(f"Found {len(change_lines)} change lines to analyze")
for line in change_lines:
# Look for deletion indicators
deletion_indicator = line.select_one('.mw-plusminus-neg')
if deletion_indicator:
# Extract the deletion size
deletion_text = deletion_indicator.text.strip()
try:
# Remove any non-numeric characters except minus sign
deletion_size = int(''.join(c for c in deletion_text if c.isdigit() or c == '-'))
# Skip if deletion size is not greater than 100 characters
if abs(deletion_size) <= 100:
logger.info(f"Skipping deletion with size {deletion_size} (not > 100 characters)")
continue
# Get the page title and URL
title_element = line.select_one('.mw-changeslist-title')
if title_element:
page_title = title_element.text.strip()
# Skip if page is already in the JSON file
if page_title in existing_pages:
logger.info(f"Skipping {page_title} (already in JSON file)")
continue
page_url = title_element.get('href', '')
if not page_url.startswith('http'):
page_url = f"{WIKI_BASE_URL}{page_url}"
# Extract oldid from the URL if available
oldid = None
if 'oldid=' in page_url:
parsed_url = urlparse(page_url)
query_params = parse_qs(parsed_url.query)
if 'oldid' in query_params:
oldid = query_params['oldid'][0]
# Fetch the page content to count characters
page_html = fetch_page_content(page_title)
total_chars = count_page_characters(page_html)
# Calculate deletion percentage
deletion_percentage = 0
if total_chars > 0:
deletion_percentage = (abs(deletion_size) / total_chars) * 100
# If deletion percentage is significant
if deletion_percentage > DELETION_THRESHOLD_PERCENT:
# Get the timestamp
timestamp_element = line.select_one('.mw-changeslist-date')
timestamp = timestamp_element.text.strip() if timestamp_element else ""
# Get the user who made the change
user_element = line.select_one('.mw-userlink')
user = user_element.text.strip() if user_element else "Unknown"
# Get the comment if available
comment_element = line.select_one('.comment')
comment = comment_element.text.strip() if comment_element else ""
# Generate diff and history URLs
diff_url = generate_diff_url(page_title, oldid) if oldid else ""
history_url = generate_history_url(page_title)
suspicious_deletions.append({
'page_title': page_title,
'page_url': page_url,
'diff_url': diff_url,
'history_url': history_url,
'deletion_size': deletion_size,
'total_chars': total_chars,
'deletion_percentage': round(deletion_percentage, 2),
'timestamp': timestamp,
'user': user,
'comment': comment
})
logger.info(f"Found suspicious deletion: {page_title} ({deletion_size} chars, {deletion_percentage:.2f}% of content)")
except ValueError:
logger.warning(f"Could not parse deletion size from: {deletion_text}")
return suspicious_deletions
def save_suspicious_deletions(suspicious_deletions):
"""
Save the suspicious deletions to a JSON file
"""
output_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'suspicious_deletions.json')
# Add timestamp to the data
data = {
'last_updated': datetime.now().isoformat(),
'deletions': suspicious_deletions
}
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
logger.info(f"Saved {len(suspicious_deletions)} suspicious deletions to {output_file}")
return output_file
def main():
parser = argparse.ArgumentParser(description='Detect suspicious deletions in OSM Wiki recent changes')
parser.add_argument('--dry-run', action='store_true', help='Print results without saving to file')
args = parser.parse_args()
html_content = fetch_recent_changes()
if html_content:
suspicious_deletions = parse_suspicious_deletions(html_content)
if args.dry_run:
logger.info(f"Found {len(suspicious_deletions)} suspicious deletions:")
for deletion in suspicious_deletions:
logger.info(f"- {deletion['page_title']}: {deletion['deletion_size']} chars by {deletion['user']}")
else:
output_file = save_suspicious_deletions(suspicious_deletions)
logger.info(f"Results saved to {output_file}")
else:
logger.error("Failed to fetch recent changes. Exiting.")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,697 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
fetch_archived_proposals.py
This script scrapes archived proposals from the OpenStreetMap wiki and extracts voting information.
It analyzes the voting patterns, counts votes by type (approve, oppose, abstain), and collects
information about the users who voted.
The script saves the data to a JSON file that can be used by the Symfony application.
Usage:
python fetch_archived_proposals.py [--force] [--limit N]
Options:
--force Force refresh of all proposals, even if they have already been processed
--limit N Limit processing to N proposals (default: process all proposals)
Output:
- archived_proposals.json file with voting information
"""
import argparse
import json
import logging
import os
import re
import sys
import time
from datetime import datetime
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup, NavigableString
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
ARCHIVED_PROPOSALS_URL = "https://wiki.openstreetmap.org/wiki/Category:Archived_proposals"
import os
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
ARCHIVED_PROPOSALS_FILE = os.path.join(SCRIPT_DIR, "archived_proposals.json")
USER_AGENT = "OSM-Commerces/1.0 (https://github.com/yourusername/osm-commerces; your@email.com)"
RATE_LIMIT_DELAY = 1 # seconds between requests to avoid rate limiting
# Vote patterns
VOTE_PATTERNS = {
'approve': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:approve|support|agree\s+with)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:approve|support)',
r'(?:Symbol\s+support\s+vote\.svg|Symbol_support_vote\.svg)',
],
'oppose': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:oppose|disagree\s+with|reject|do\s+not\s+support)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:oppose|reject|against)',
r'(?:Symbol\s+oppose\s+vote\.svg|Symbol_oppose_vote\.svg)',
],
'abstain': [
r'I\s+(?:have\s+comments\s+but\s+)?abstain\s+from\s+voting',
r'I\s+(?:have\s+comments\s+but\s+)?(?:neither\s+approve\s+nor\s+oppose|am\s+neutral)',
r'(?:Symbol\s+abstain\s+vote\.svg|Symbol_abstain_vote\.svg)',
]
}
def parse_arguments():
"""Parse command line arguments"""
parser = argparse.ArgumentParser(description='Fetch and analyze archived OSM proposals')
parser.add_argument('--force', action='store_true', help='Force refresh of all proposals')
parser.add_argument('--limit', type=int, help='Limit processing to N proposals (default: process all)')
return parser.parse_args()
def load_existing_data():
"""Load existing archived proposals data if available"""
if os.path.exists(ARCHIVED_PROPOSALS_FILE):
try:
with open(ARCHIVED_PROPOSALS_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
logger.info(f"Loaded {len(data.get('proposals', []))} existing proposals from {ARCHIVED_PROPOSALS_FILE}")
return data
except (json.JSONDecodeError, IOError) as e:
logger.error(f"Error loading existing data: {e}")
# Return empty structure if file doesn't exist or has errors
return {
'last_updated': None,
'proposals': []
}
def save_data(data):
"""Save data to JSON file"""
try:
# Update last_updated timestamp
data['last_updated'] = datetime.now().isoformat()
with open(ARCHIVED_PROPOSALS_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Saved {len(data.get('proposals', []))} proposals to {ARCHIVED_PROPOSALS_FILE}")
except IOError as e:
logger.error(f"Error saving data: {e}")
except Exception as e:
logger.error(f"Unexpected error saving data: {e}")
def fetch_page(url):
"""Fetch a page from the OSM wiki"""
headers = {
'User-Agent': USER_AGENT
}
try:
response = requests.get(url, headers=headers)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def get_proposal_urls():
"""Get URLs of all archived proposals"""
logger.info(f"Fetching archived proposals list from {ARCHIVED_PROPOSALS_URL}")
html = fetch_page(ARCHIVED_PROPOSALS_URL)
if not html:
return []
soup = BeautifulSoup(html, 'html.parser')
# Find all links in the category pages
proposal_urls = []
# Get proposals from the main category page
category_content = soup.select_one('#mw-pages')
if category_content:
for link in category_content.select('a'):
if link.get('title') and 'Category:' not in link.get('title'):
proposal_urls.append({
'title': link.get('title'),
'url': urljoin(ARCHIVED_PROPOSALS_URL, link.get('href'))
})
# Check if there are subcategories
subcategories = soup.select('#mw-subcategories a')
for subcat in subcategories:
if 'Category:' in subcat.get('title', ''):
logger.info(f"Found subcategory: {subcat.get('title')}")
subcat_url = urljoin(ARCHIVED_PROPOSALS_URL, subcat.get('href'))
# Fetch the subcategory page
time.sleep(RATE_LIMIT_DELAY) # Respect rate limits
subcat_html = fetch_page(subcat_url)
if subcat_html:
subcat_soup = BeautifulSoup(subcat_html, 'html.parser')
subcat_content = subcat_soup.select_one('#mw-pages')
if subcat_content:
for link in subcat_content.select('a'):
if link.get('title') and 'Category:' not in link.get('title'):
proposal_urls.append({
'title': link.get('title'),
'url': urljoin(ARCHIVED_PROPOSALS_URL, link.get('href'))
})
logger.info(f"Found {len(proposal_urls)} archived proposals")
return proposal_urls
def extract_username(text):
"""Extract username from a signature line"""
# Common patterns for signatures
patterns = [
r'--\s*\[\[User:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User:Username|Username]]
r'--\s*\[\[User:([^|\]]+)\]\]', # --[[User:Username]]
r'--\s*\[\[User talk:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User talk:Username|Username]]
r'--\s*\[\[User talk:([^|\]]+)\]\]', # --[[User talk:Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[Special:Contributions/Username|Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)\]\]', # --[[Special:Contributions/Username]]
]
for pattern in patterns:
match = re.search(pattern, text)
if match:
return match.group(1).strip()
# If no match found with the patterns, try to find any username-like string
match = re.search(r'--\s*([A-Za-z0-9_-]+)', text)
if match:
return match.group(1).strip()
return None
def extract_date(text):
"""Extract date from a signature line"""
# Look for common date formats in signatures
date_patterns = [
r'(\d{1,2}:\d{2}, \d{1,2} [A-Za-z]+ \d{4})', # 15:30, 25 December 2023
r'(\d{1,2} [A-Za-z]+ \d{4} \d{1,2}:\d{2})', # 25 December 2023 15:30
r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})', # 2023-12-25T15:30:00
]
for pattern in date_patterns:
match = re.search(pattern, text)
if match:
return match.group(1)
return None
def determine_vote_type(text):
"""Determine the type of vote from the text"""
text_lower = text.lower()
for vote_type, patterns in VOTE_PATTERNS.items():
for pattern in patterns:
if re.search(pattern, text_lower, re.IGNORECASE):
return vote_type
return None
def extract_votes(html):
"""Extract voting information from proposal HTML"""
soup = BeautifulSoup(html, 'html.parser')
# Find the voting section
voting_section = None
for heading in soup.find_all(['h2', 'h3']):
heading_text = heading.get_text().lower()
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
voting_section = heading
break
if not voting_section:
logger.warning("No voting section found")
return {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# Get the content after the voting section heading
votes_content = []
current = voting_section.next_sibling
# Collect all elements until the next heading or the end of the document
while current and not current.name in ['h2', 'h3']:
if current.name: # Skip NavigableString objects
votes_content.append(current)
current = current.next_sibling
# Process vote lists
votes = {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# For tracking vote dates to calculate duration
all_vote_dates = []
# Look for lists of votes
for element in votes_content:
if element.name == 'ul':
for li in element.find_all('li'):
vote_text = li.get_text()
vote_type = determine_vote_type(vote_text)
if vote_type:
username = extract_username(vote_text)
date = extract_date(vote_text)
# Extract comment by removing vote declaration and signature
comment = vote_text
# Remove vote declaration patterns
for pattern in VOTE_PATTERNS[vote_type]:
comment = re.sub(pattern, '', comment, flags=re.IGNORECASE)
# Remove signature
signature_patterns = [
r'--\s*\[\[User:[^]]+\]\].*$',
r'--\s*\[\[User talk:[^]]+\]\].*$',
r'--\s*\[\[Special:Contributions/[^]]+\]\].*$',
r'--\s*[A-Za-z0-9_-]+.*$'
]
for pattern in signature_patterns:
comment = re.sub(pattern, '', comment, flags=re.IGNORECASE)
# Clean up the comment
comment = comment.strip()
if username:
votes[vote_type]['count'] += 1
votes[vote_type]['users'].append({
'username': username,
'date': date,
'comment': comment
})
# Add date to list for duration calculation if it's valid
if date:
try:
# Try to parse the date in different formats
parsed_date = None
for date_format in [
'%H:%M, %d %B %Y', # 15:30, 25 December 2023
'%d %B %Y %H:%M', # 25 December 2023 15:30
'%Y-%m-%dT%H:%M:%S' # 2023-12-25T15:30:00
]:
try:
parsed_date = datetime.strptime(date, date_format)
break
except ValueError:
continue
if parsed_date:
all_vote_dates.append(parsed_date)
except Exception as e:
logger.warning(f"Could not parse date '{date}': {e}")
# Calculate vote duration if we have at least two dates
if len(all_vote_dates) >= 2:
all_vote_dates.sort()
first_vote = all_vote_dates[0]
last_vote = all_vote_dates[-1]
vote_duration_days = (last_vote - first_vote).days
votes['first_vote'] = first_vote.strftime('%Y-%m-%d')
votes['last_vote'] = last_vote.strftime('%Y-%m-%d')
votes['duration_days'] = vote_duration_days
return votes
def extract_proposal_metadata(html, url, original_title=None):
"""Extract metadata about the proposal"""
soup = BeautifulSoup(html, 'html.parser')
# Get title
title_element = soup.select_one('#firstHeading')
extracted_title = title_element.get_text() if title_element else "Unknown Title"
# Debug logging
logger.debug(f"Original title: '{original_title}', Extracted title: '{extracted_title}'")
# Check if the extracted title is a username or user page
# This covers both "User:Username" and other user-related pages
if (extracted_title.startswith("User:") or
"User:" in extracted_title or
"User talk:" in extracted_title) and original_title:
logger.info(f"Extracted title '{extracted_title}' appears to be a user page. Using original title '{original_title}' instead.")
title = original_title
else:
title = extracted_title
# Get last modified date
last_modified = None
footer_info = soup.select_one('#footer-info-lastmod')
if footer_info:
last_modified_text = footer_info.get_text()
match = re.search(r'(\d{1,2} [A-Za-z]+ \d{4})', last_modified_text)
if match:
last_modified = match.group(1)
# Get content element for further processing
content = soup.select_one('#mw-content-text')
# Get proposer from the page
proposer = None
# Get proposal status from the page
status = None
# Look for table rows to find proposer and status
if content:
# Look for table rows
for row in content.select('tr'):
# Check if the row has at least two cells (th and td)
cells = row.select('th, td')
if len(cells) >= 2:
# Get the header text from the first cell
header_text = cells[0].get_text().strip().lower()
# Check for "Proposed by:" to find proposer
if "proposed by" in header_text:
# Look for user link in the next cell
user_link = cells[1].select_one('a[href*="/wiki/User:"]')
if user_link:
# Extract username from the link
href = user_link.get('href', '')
title = user_link.get('title', '')
# Try to get username from title attribute first
if title and title.startswith('User:'):
proposer = title[5:] # Remove 'User:' prefix
# Otherwise try to extract from href
elif href:
href_match = re.search(r'/wiki/User:([^/]+)', href)
if href_match:
proposer = href_match.group(1)
# If still no proposer, use the link text
if not proposer and user_link.get_text():
proposer = user_link.get_text().strip()
logger.info(f"Found proposer in table: {proposer}")
# Check for "Proposal status:" to find status
elif "proposal status" in header_text:
# Get the status from the next cell
status_cell = cells[1]
# First try to find a link with a category title containing status
status_link = status_cell.select_one('a[title*="Category:Proposals with"]')
if status_link:
# Extract status from the title attribute
status_match = re.search(r'Category:Proposals with "([^"]+)" status', status_link.get('title', ''))
if status_match:
status = status_match.group(1)
logger.info(f"Found status in table link: {status}")
# If no status found in link, try to get text content
if not status:
status_text = status_cell.get_text().strip()
# Try to match one of the known statuses
known_statuses = [
"Draft", "Proposed", "Voting", "Post-vote", "Approved",
"Rejected", "Abandoned", "Canceled", "Obsoleted",
"Inactive", "Undefined"
]
for known_status in known_statuses:
if known_status.lower() in status_text.lower():
status = known_status
logger.info(f"Found status in table text: {status}")
break
# If no proposer found in table, try the first paragraph method
if not proposer:
first_paragraph = soup.select_one('#mw-content-text p')
if first_paragraph:
proposer_match = re.search(r'(?:proposed|created|authored)\s+by\s+\[\[User:([^|\]]+)', first_paragraph.get_text())
if proposer_match:
proposer = proposer_match.group(1)
logger.info(f"Found proposer in paragraph: {proposer}")
# Count sections, links, and words
section_count = len(soup.select('#mw-content-text h2, #mw-content-text h3, #mw-content-text h4')) if content else 0
# Count links excluding user/talk pages (voting signatures)
links = []
if content:
for link in content.select('a'):
href = link.get('href', '')
if href and not re.search(r'User:|User_talk:|Special:Contributions', href):
links.append(href)
link_count = len(links)
# Approximate word count
word_count = 0
if content:
# Get text content excluding navigation elements
for nav in content.select('.navbox, .ambox, .tmbox, .mw-editsection'):
nav.decompose()
# Also exclude the voting section to count only the proposal content
voting_section = None
for heading in content.find_all(['h2', 'h3']):
heading_text = heading.get_text().lower()
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
voting_section = heading
break
if voting_section:
# Remove the voting section and everything after it
current = voting_section
while current:
next_sibling = current.next_sibling
# Only call decompose() if current is not a NavigableString
# NavigableString objects don't have a decompose() method
if not isinstance(current, NavigableString):
current.decompose()
current = next_sibling
# Count words in the remaining content
text = content.get_text()
word_count = len(re.findall(r'\b\w+\b', text))
return {
'title': title,
'url': url,
'last_modified': last_modified,
'proposer': proposer,
'status': status,
'section_count': section_count,
'link_count': link_count,
'word_count': word_count
}
def process_proposal(proposal, force=False):
"""Process a single proposal and extract voting information"""
url = proposal['url']
title = proposal['title']
logger.info(f"Processing proposal: {title}")
# Fetch the proposal page
html = fetch_page(url)
if not html:
return None
# Extract metadata
metadata = extract_proposal_metadata(html, url, original_title=title)
# Extract votes
votes = extract_votes(html)
# Combine metadata and votes
result = {**metadata, 'votes': votes}
# Calculate total votes and percentages
total_votes = votes['approve']['count'] + votes['oppose']['count'] + votes['abstain']['count']
if total_votes > 0:
result['total_votes'] = total_votes
result['approve_percentage'] = round((votes['approve']['count'] / total_votes) * 100, 1)
result['oppose_percentage'] = round((votes['oppose']['count'] / total_votes) * 100, 1)
result['abstain_percentage'] = round((votes['abstain']['count'] / total_votes) * 100, 1)
else:
result['total_votes'] = 0
result['approve_percentage'] = 0
result['oppose_percentage'] = 0
result['abstain_percentage'] = 0
return result
def main():
"""Main function to execute the script"""
args = parse_arguments()
force = args.force
limit = args.limit
logger.info("Starting fetch_archived_proposals.py")
if limit:
logger.info(f"Processing limited to {limit} proposals")
# Load existing data
data = load_existing_data()
# Get list of proposal URLs
proposal_urls = get_proposal_urls()
# Apply limit if specified
if limit and limit < len(proposal_urls):
logger.info(f"Limiting processing from {len(proposal_urls)} to {limit} proposals")
proposal_urls = proposal_urls[:limit]
# Create a map of existing proposals by URL for quick lookup
existing_proposals = {p['url']: p for p in data.get('proposals', [])}
# Process each proposal
new_proposals = []
processed_count = 0
for proposal in proposal_urls:
url = proposal['url']
original_title = proposal['title']
# Skip if already processed and not forcing refresh
if url in existing_proposals and not force:
logger.info(f"Skipping already processed proposal: {original_title}")
new_proposals.append(existing_proposals[url])
continue
# Process the proposal
time.sleep(RATE_LIMIT_DELAY) # Respect rate limits
processed = process_proposal(proposal, force)
if processed:
# Ensure the title is preserved from the original proposal
if processed.get('title') != original_title:
# Check if the title contains "User:" - if it does, we've already handled it in extract_proposal_metadata
# and don't need to log a warning
if "User:" in processed.get('title', ''):
logger.debug(f"Title contains 'User:' - already handled in extract_proposal_metadata")
else:
logger.warning(f"Title changed during processing from '{original_title}' to '{processed.get('title')}'. Restoring original title.")
processed['title'] = original_title
new_proposals.append(processed)
processed_count += 1
# Check if we've reached the limit
if limit and processed_count >= limit:
logger.info(f"Reached limit of {limit} processed proposals")
break
# Update the data
data['proposals'] = new_proposals
# Calculate global statistics
total_proposals = len(new_proposals)
total_votes = sum(p.get('total_votes', 0) for p in new_proposals)
# Calculate votes per proposal statistics, excluding proposals with 0 votes
proposals_with_votes = [p for p in new_proposals if p.get('total_votes', 0) > 0]
num_proposals_with_votes = len(proposals_with_votes)
if num_proposals_with_votes > 0:
# Calculate average votes per proposal (excluding proposals with 0 votes)
votes_per_proposal = [p.get('total_votes', 0) for p in proposals_with_votes]
avg_votes_per_proposal = round(sum(votes_per_proposal) / num_proposals_with_votes, 1)
# Calculate median votes per proposal
votes_per_proposal.sort()
if num_proposals_with_votes % 2 == 0:
# Even number of proposals, average the middle two
median_votes_per_proposal = round((votes_per_proposal[num_proposals_with_votes // 2 - 1] +
votes_per_proposal[num_proposals_with_votes // 2]) / 2, 1)
else:
# Odd number of proposals, take the middle one
median_votes_per_proposal = votes_per_proposal[num_proposals_with_votes // 2]
# Calculate standard deviation of votes per proposal
mean = sum(votes_per_proposal) / num_proposals_with_votes
variance = sum((x - mean) ** 2 for x in votes_per_proposal) / num_proposals_with_votes
std_dev_votes_per_proposal = round((variance ** 0.5), 1)
else:
avg_votes_per_proposal = 0
median_votes_per_proposal = 0
std_dev_votes_per_proposal = 0
# Count unique voters
all_voters = set()
for p in new_proposals:
for vote_type in ['approve', 'oppose', 'abstain']:
for user in p.get('votes', {}).get(vote_type, {}).get('users', []):
if 'username' in user:
all_voters.add(user['username'])
# Find most active voters
voter_counts = {}
for p in new_proposals:
for vote_type in ['approve', 'oppose', 'abstain']:
for user in p.get('votes', {}).get(vote_type, {}).get('users', []):
if 'username' in user:
username = user['username']
if username not in voter_counts:
voter_counts[username] = {'total': 0, 'approve': 0, 'oppose': 0, 'abstain': 0}
voter_counts[username]['total'] += 1
voter_counts[username][vote_type] += 1
# Sort voters by total votes
top_voters = sorted(
[{'username': k, **v} for k, v in voter_counts.items()],
key=lambda x: x['total'],
reverse=True
)[:100] # Top 100 voters
# Count proposals by status
status_counts = {}
for p in new_proposals:
status = p.get('status')
if status:
status_counts[status] = status_counts.get(status, 0) + 1
else:
status_counts['Unknown'] = status_counts.get('Unknown', 0) + 1
# Ensure status_counts is never empty
if not status_counts:
status_counts['No Status'] = 0
# Calculate average vote duration
proposals_with_duration = [p for p in new_proposals if 'votes' in p and 'duration_days' in p['votes']]
avg_vote_duration = 0
if proposals_with_duration:
total_duration = sum(p['votes']['duration_days'] for p in proposals_with_duration)
avg_vote_duration = round(total_duration / len(proposals_with_duration), 1)
# Add statistics to the data
data['statistics'] = {
'total_proposals': total_proposals,
'total_votes': total_votes,
'avg_votes_per_proposal': avg_votes_per_proposal,
'median_votes_per_proposal': median_votes_per_proposal,
'std_dev_votes_per_proposal': std_dev_votes_per_proposal,
'avg_vote_duration_days': avg_vote_duration,
'unique_voters': len(all_voters),
'top_voters': top_voters,
'status_distribution': status_counts
}
# Save the data
save_data(data)
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,517 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
fetch_osm_fr_groups.py
This script fetches information about OSM-FR local groups from two sources:
1. The OpenStreetMap wiki page for France/OSM-FR (specifically the #Pages_des_groupes_locaux section)
2. The Framacalc spreadsheet at https://framacalc.org/osm-groupes-locaux
It then verifies that each group from the Framacalc has a corresponding wiki page.
Usage:
python fetch_osm_fr_groups.py [--dry-run] [--force]
Options:
--dry-run Run the script without saving the results to a file
--force Force update even if the cache is still fresh (less than 1 hour old)
Output:
- osm_fr_groups.json: JSON file with information about OSM-FR local groups
- Log messages about the scraping process and results
"""
import json
import argparse
import logging
import os
import csv
import io
from datetime import datetime, timedelta
import requests
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTPUT_FILE = "osm_fr_groups.json"
BASE_URL = "https://wiki.openstreetmap.org/wiki/France/OSM-FR"
WIKI_BASE_URL = "https://wiki.openstreetmap.org"
FRAMACALC_URL = "https://framacalc.org/osm-groupes-locaux/export/csv"
WIKI_GROUPS_URL = "https://wiki.openstreetmap.org/wiki/France/OSM-FR#Groupes_locaux"
CACHE_DURATION = timedelta(hours=1) # Cache duration of 1 hour
def is_cache_fresh():
"""
Check if the cache file exists and is less than CACHE_DURATION old
Returns:
bool: True if cache is fresh, False otherwise
"""
if not os.path.exists(OUTPUT_FILE):
return False
try:
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
last_updated = datetime.fromisoformat(data.get('last_updated', '2000-01-01T00:00:00'))
now = datetime.now()
return (now - last_updated) < CACHE_DURATION
except (IOError, json.JSONDecodeError, ValueError) as e:
logger.error(f"Error checking cache freshness: {e}")
return False
def get_page_content(url):
"""
Get the HTML content of a page
Args:
url (str): URL to fetch
Returns:
str: HTML content of the page or None if request failed
"""
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def extract_working_groups(html_content):
"""
Extract working groups from the wiki page HTML
Args:
html_content (str): HTML content of the wiki page
Returns:
list: List of working group dictionaries
"""
if not html_content:
return []
soup = BeautifulSoup(html_content, 'html.parser')
working_groups = []
# Find the working groups section
working_groups_section = None
for heading in soup.find_all(['h2', 'h3']):
if heading.get_text().strip() == 'Groupes de travail' or 'Groupes_de_travail' in heading.get_text():
working_groups_section = heading
break
if not working_groups_section:
logger.warning("Could not find working groups section")
# Return an empty list but with a default category
return []
# Get the content following the heading until the next heading
current = working_groups_section.next_sibling
while current and not current.name in ['h2', 'h3']:
if current.name == 'ul':
# Process list items
for li in current.find_all('li', recursive=False):
link = li.find('a')
if link:
name = link.get_text().strip()
url = WIKI_BASE_URL + link.get('href') if link.get('href').startswith('/') else link.get('href')
# Extract description (text after the link)
description = ""
next_node = link.next_sibling
while next_node:
if isinstance(next_node, str):
description += next_node.strip()
next_node = next_node.next_sibling if hasattr(next_node, 'next_sibling') else None
description = description.strip(' :-,')
working_groups.append({
"name": name,
"url": url,
"description": description,
"category": "Général",
"type": "working_group"
})
current = current.next_sibling
logger.info(f"Found {len(working_groups)} working groups")
return working_groups
def extract_local_groups_from_wiki(html_content):
"""
Extract local groups from the wiki page HTML
Args:
html_content (str): HTML content of the wiki page
Returns:
list: List of local group dictionaries
"""
if not html_content:
return []
soup = BeautifulSoup(html_content, 'html.parser')
local_groups = []
# Find the local groups section
local_groups_section = None
for heading in soup.find_all(['h2', 'h3']):
if heading.get_text().strip() == 'Groupes locaux' or 'Pages des groupes locaux' in heading.get_text():
local_groups_section = heading
break
if not local_groups_section:
logger.warning("Could not find local groups section")
return []
# Get the content following the heading until the next heading
current = local_groups_section.next_sibling
while current and not current.name in ['h2', 'h3']:
if current.name == 'ul':
# Process list items
for li in current.find_all('li', recursive=False):
link = li.find('a')
if link:
name = link.get_text().strip()
url = WIKI_BASE_URL + link.get('href') if link.get('href').startswith('/') else link.get('href')
# Extract description (text after the link)
description = ""
next_node = link.next_sibling
while next_node:
if isinstance(next_node, str):
description += next_node.strip()
next_node = next_node.next_sibling if hasattr(next_node, 'next_sibling') else None
description = description.strip(' :-,')
local_groups.append({
"name": name,
"url": url,
"description": description,
"type": "local_group",
"source": "wiki"
})
current = current.next_sibling
logger.info(f"Found {len(local_groups)} local groups from wiki")
return local_groups
def fetch_framacalc_data():
"""
Fetch local groups data from Framacalc
Returns:
list: List of local group dictionaries from Framacalc
"""
try:
response = requests.get(FRAMACALC_URL)
response.raise_for_status()
# Parse CSV data
csv_data = csv.reader(io.StringIO(response.text))
rows = list(csv_data)
# Check if we have data
if len(rows) < 2:
logger.warning("No data found in Framacalc CSV")
return []
# Extract headers (first row)
headers = rows[0]
# Find the indices of important columns
name_idx = -1
contact_idx = -1
website_idx = -1
for i, header in enumerate(headers):
header_lower = header.lower()
if 'nom' in header_lower or 'groupe' in header_lower:
name_idx = i
elif 'contact' in header_lower or 'email' in header_lower:
contact_idx = i
elif 'site' in header_lower or 'web' in header_lower:
website_idx = i
if name_idx == -1:
logger.warning("Could not find name column in Framacalc CSV")
return []
# Process data rows
local_groups = []
for row in rows[1:]: # Skip header row
if len(row) <= name_idx or not row[name_idx].strip():
continue # Skip empty rows
name = row[name_idx].strip()
contact = row[contact_idx].strip() if contact_idx != -1 and contact_idx < len(row) else ""
website = row[website_idx].strip() if website_idx != -1 and website_idx < len(row) else ""
local_groups.append({
"name": name,
"contact": contact,
"website": website,
"type": "local_group",
"source": "framacalc",
"has_wiki_page": False, # Will be updated later
"wiki_url": "" # Will be updated later
})
logger.info(f"Found {len(local_groups)} local groups from Framacalc")
return local_groups
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching Framacalc data: {e}")
return []
except Exception as e:
logger.error(f"Error processing Framacalc data: {e}")
return []
def extract_wiki_group_links():
"""
Extract links to local group wiki pages from the OSM-FR wiki page
Returns:
dict: Dictionary mapping group names to wiki URLs
"""
try:
# Get the wiki page content
response = requests.get(WIKI_GROUPS_URL)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
wiki_links = {}
# Find the "Pages des groupes locaux" section
pages_section = None
for heading in soup.find_all(['h2', 'h3', 'h4']):
if 'Pages des groupes locaux' in heading.get_text():
pages_section = heading
break
if not pages_section:
logger.warning("Could not find 'Pages des groupes locaux' section")
return {}
# Get the content following the heading until the next heading
current = pages_section.next_sibling
while current and not current.name in ['h2', 'h3', 'h4']:
if current.name == 'ul':
# Process list items
for li in current.find_all('li', recursive=False):
text = li.get_text().strip()
link = li.find('a')
if link and text:
# Extract group name (before the comma)
parts = text.split(',', 1)
group_name = parts[0].strip()
url = WIKI_BASE_URL + link.get('href') if link.get('href').startswith('/') else link.get('href')
wiki_links[group_name] = url
current = current.next_sibling
logger.info(f"Found {len(wiki_links)} wiki links for local groups")
return wiki_links
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching wiki group links: {e}")
return {}
except Exception as e:
logger.error(f"Error processing wiki group links: {e}")
return {}
def verify_framacalc_groups_have_wiki(framacalc_groups, wiki_links):
"""
Verify that each group from Framacalc has a corresponding wiki page
Args:
framacalc_groups (list): List of local group dictionaries from Framacalc
wiki_links (dict): Dictionary mapping group names to wiki URLs
Returns:
list: Updated list of local group dictionaries with wiki verification
"""
for group in framacalc_groups:
group_name = group['name']
# Try to find a matching wiki link
found = False
for wiki_name, wiki_url in wiki_links.items():
# Check if the group name is similar to the wiki name
if group_name.lower() in wiki_name.lower() or wiki_name.lower() in group_name.lower():
group['has_wiki_page'] = True
group['wiki_url'] = wiki_url
found = True
break
if not found:
group['has_wiki_page'] = False
group['wiki_url'] = ""
return framacalc_groups
def extract_umap_url(html_content):
"""
Extract the uMap URL for OSM-FR local groups
Args:
html_content (str): HTML content of the wiki page
Returns:
str: uMap URL or None if not found
"""
if not html_content:
return None
soup = BeautifulSoup(html_content, 'html.parser')
# Look for links to umap.openstreetmap.fr
for link in soup.find_all('a'):
href = link.get('href', '')
if 'umap.openstreetmap.fr' in href and 'groupes-locaux' in href:
return href
return None
def save_results(wiki_local_groups, framacalc_groups, working_groups, umap_url, wiki_links, dry_run=False):
"""
Save the results to a JSON file
Args:
wiki_local_groups (list): List of local group dictionaries from wiki
framacalc_groups (list): List of local group dictionaries from Framacalc
working_groups (list): List of working group dictionaries
umap_url (str): URL to the uMap for local groups
wiki_links (dict): Dictionary mapping group names to wiki URLs
dry_run (bool): If True, don't actually save to file
Returns:
bool: True if saving was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have saved results to file")
logger.info(f"Wiki local groups: {len(wiki_local_groups)}")
for group in wiki_local_groups[:5]: # Show only first 5 for brevity
logger.info(f" - {group['name']}: {group['url']}")
logger.info(f"Framacalc groups: {len(framacalc_groups)}")
for group in framacalc_groups[:5]: # Show only first 5 for brevity
wiki_status = "Has wiki page" if group.get('has_wiki_page') else "No wiki page"
logger.info(f" - {group['name']}: {wiki_status}")
logger.info(f"Working groups: {len(working_groups)}")
for group in working_groups[:5]: # Show only first 5 for brevity
logger.info(f" - {group['name']}: {group['url']}")
if umap_url:
logger.info(f"uMap URL: {umap_url}")
logger.info(f"Wiki links: {len(wiki_links)}")
return True
# Combine all local groups
all_local_groups = wiki_local_groups + framacalc_groups
# Prepare the data structure
data = {
"last_updated": datetime.now().isoformat(),
"local_groups": all_local_groups,
"working_groups": working_groups,
"umap_url": umap_url,
"wiki_links": wiki_links
}
try:
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Successfully saved {len(all_local_groups)} local groups and {len(working_groups)} working groups to {OUTPUT_FILE}")
return True
except IOError as e:
logger.error(f"Error saving results to {OUTPUT_FILE}: {e}")
return False
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Fetch OSM-FR local groups from wiki and Framacalc")
parser.add_argument("--dry-run", action="store_true", help="Run without saving results to file")
parser.add_argument("--force", action="store_true", help="Force update even if cache is fresh")
args = parser.parse_args()
logger.info("Starting fetch_osm_fr_groups.py")
# Check if cache is fresh
if is_cache_fresh() and not args.force:
logger.info(f"Cache is still fresh (less than {CACHE_DURATION.total_seconds()/3600} hours old)")
logger.info(f"Use --force to update anyway")
return
# Get the wiki page content
html_content = get_page_content(BASE_URL)
if not html_content:
logger.error("Failed to get wiki page content")
return
# Extract local groups from wiki
wiki_local_groups = extract_local_groups_from_wiki(html_content)
if not wiki_local_groups:
logger.warning("No local groups found in wiki")
# Extract working groups
working_groups = extract_working_groups(html_content)
if not working_groups:
logger.warning("No working groups found")
# Initialize with an empty list to avoid errors in the controller
working_groups = []
# Extract uMap URL
umap_url = extract_umap_url(html_content)
# Fetch local groups from Framacalc
framacalc_groups = fetch_framacalc_data()
if not framacalc_groups:
logger.warning("No local groups found in Framacalc")
# Extract wiki group links
wiki_links = extract_wiki_group_links()
if not wiki_links:
logger.warning("No wiki links found for local groups")
# Verify Framacalc groups have wiki pages
if framacalc_groups and wiki_links:
framacalc_groups = verify_framacalc_groups_have_wiki(framacalc_groups, wiki_links)
# Count groups with and without wiki pages
groups_with_wiki = sum(1 for group in framacalc_groups if group.get('has_wiki_page'))
groups_without_wiki = sum(1 for group in framacalc_groups if not group.get('has_wiki_page'))
logger.info(f"Framacalc groups with wiki pages: {groups_with_wiki}")
logger.info(f"Framacalc groups without wiki pages: {groups_without_wiki}")
# Save results
success = save_results(wiki_local_groups, framacalc_groups, working_groups, umap_url, wiki_links, args.dry_run)
if success:
logger.info("Script completed successfully")
else:
logger.error("Script completed with errors")
if __name__ == "__main__":
main()

392
wiki_compare/fetch_proposals.py Executable file
View file

@ -0,0 +1,392 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import requests
from bs4 import BeautifulSoup
import json
import logging
import argparse
import os
import re
import time
from datetime import datetime, timedelta
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
# URLs for OSM Wiki proposals
VOTING_PROPOSALS_URL = "https://wiki.openstreetmap.org/wiki/Category:Proposals_with_%22Voting%22_status"
RECENT_CHANGES_URL = "https://wiki.openstreetmap.org/w/index.php?title=Special:RecentChanges&namespace=102&limit=50" # Namespace 102 is for Proposal pages
# Output file
OUTPUT_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'proposals.json')
# Cache timeout (in hours)
CACHE_TIMEOUT = 1
# Vote patterns (same as in fetch_archived_proposals.py)
VOTE_PATTERNS = {
'approve': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:approve|support|agree\s+with)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:approve|support)',
r'(?:Symbol\s+support\s+vote\.svg|Symbol_support_vote\.svg)',
],
'oppose': [
r'I\s+(?:(?:strongly|fully|completely|wholeheartedly)\s+)?(?:oppose|disagree\s+with|reject|do\s+not\s+support)\s+this\s+proposal',
r'I\s+vote\s+(?:to\s+)?(?:oppose|reject|against)',
r'(?:Symbol\s+oppose\s+vote\.svg|Symbol_oppose_vote\.svg)',
],
'abstain': [
r'I\s+(?:have\s+comments\s+but\s+)?abstain\s+from\s+voting',
r'I\s+(?:have\s+comments\s+but\s+)?(?:neither\s+approve\s+nor\s+oppose|am\s+neutral)',
r'(?:Symbol\s+abstain\s+vote\.svg|Symbol_abstain_vote\.svg)',
]
}
def should_update_cache():
"""
Check if the cache file exists and if it's older than the cache timeout
"""
if not os.path.exists(OUTPUT_FILE):
logger.info("Cache file doesn't exist, creating it")
return True
# Check file modification time
file_mtime = datetime.fromtimestamp(os.path.getmtime(OUTPUT_FILE))
now = datetime.now()
# If file is older than cache timeout, update it
if now - file_mtime > timedelta(hours=CACHE_TIMEOUT):
logger.info(f"Cache is older than {CACHE_TIMEOUT} hour(s), updating")
return True
logger.info(f"Cache is still fresh (less than {CACHE_TIMEOUT} hour(s) old)")
return False
def fetch_page(url):
"""
Fetch a page from the OSM wiki
"""
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def extract_username(text):
"""
Extract username from a signature line
"""
# Common patterns for signatures
patterns = [
r'--\s*\[\[User:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User:Username|Username]]
r'--\s*\[\[User:([^|\]]+)\]\]', # --[[User:Username]]
r'--\s*\[\[User talk:([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[User talk:Username|Username]]
r'--\s*\[\[User talk:([^|\]]+)\]\]', # --[[User talk:Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)(?:\|[^\]]+)?\]\]', # --[[Special:Contributions/Username|Username]]
r'--\s*\[\[Special:Contributions/([^|\]]+)\]\]', # --[[Special:Contributions/Username]]
]
for pattern in patterns:
match = re.search(pattern, text)
if match:
return match.group(1).strip()
# If no match found with the patterns, try to find any username-like string
match = re.search(r'--\s*([A-Za-z0-9_-]+)', text)
if match:
return match.group(1).strip()
return None
def extract_date(text):
"""
Extract date from a signature line
"""
# Look for common date formats in signatures
date_patterns = [
r'(\d{1,2}:\d{2}, \d{1,2} [A-Za-z]+ \d{4})', # 15:30, 25 December 2023
r'(\d{1,2} [A-Za-z]+ \d{4} \d{1,2}:\d{2})', # 25 December 2023 15:30
r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})', # 2023-12-25T15:30:00
]
for pattern in date_patterns:
match = re.search(pattern, text)
if match:
return match.group(1)
return None
def determine_vote_type(text):
"""
Determine the type of vote from the text
"""
text_lower = text.lower()
for vote_type, patterns in VOTE_PATTERNS.items():
for pattern in patterns:
if re.search(pattern, text_lower, re.IGNORECASE):
return vote_type
return None
def extract_votes(html):
"""
Extract voting information from proposal HTML
"""
soup = BeautifulSoup(html, 'html.parser')
# Find the voting section
voting_section = None
for heading in soup.find_all(['h2', 'h3']):
heading_text = heading.get_text().lower()
if 'voting' in heading_text or 'votes' in heading_text or 'poll' in heading_text:
voting_section = heading
break
if not voting_section:
logger.warning("No voting section found")
return {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# Get the content after the voting section heading
votes_content = []
current = voting_section.next_sibling
# Collect all elements until the next heading or the end of the document
while current and not current.name in ['h2', 'h3']:
if current.name: # Skip NavigableString objects
votes_content.append(current)
current = current.next_sibling
# Process vote lists
votes = {
'approve': {'count': 0, 'users': []},
'oppose': {'count': 0, 'users': []},
'abstain': {'count': 0, 'users': []}
}
# Look for lists of votes
for element in votes_content:
if element.name == 'ul':
for li in element.find_all('li'):
vote_text = li.get_text()
vote_type = determine_vote_type(vote_text)
if vote_type:
username = extract_username(vote_text)
date = extract_date(vote_text)
if username:
votes[vote_type]['count'] += 1
votes[vote_type]['users'].append({
'username': username,
'date': date
})
return votes
def fetch_voting_proposals():
"""
Fetch proposals with "Voting" status from the OSM Wiki
"""
logger.info(f"Fetching voting proposals from {VOTING_PROPOSALS_URL}")
try:
response = requests.get(VOTING_PROPOSALS_URL)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
proposals = []
# Find all links in the mw-pages section
links = soup.select('#mw-pages a')
for link in links:
# Skip category links and other non-proposal links
if 'Category:' in link.get('href', '') or 'Special:' in link.get('href', ''):
continue
proposal_title = link.text.strip()
proposal_url = 'https://wiki.openstreetmap.org' + link.get('href', '')
# Create a basic proposal object
proposal = {
'title': proposal_title,
'url': proposal_url,
'status': 'Voting',
'type': 'voting'
}
# Fetch the proposal page to extract voting information
logger.info(f"Fetching proposal page: {proposal_title}")
html = fetch_page(proposal_url)
if html:
# Extract voting information
votes = extract_votes(html)
# Add voting information to the proposal
proposal['votes'] = votes
# Calculate total votes and percentages
total_votes = votes['approve']['count'] + votes['oppose']['count'] + votes['abstain']['count']
if total_votes > 0:
proposal['total_votes'] = total_votes
proposal['approve_percentage'] = round((votes['approve']['count'] / total_votes) * 100, 1)
proposal['oppose_percentage'] = round((votes['oppose']['count'] / total_votes) * 100, 1)
proposal['abstain_percentage'] = round((votes['abstain']['count'] / total_votes) * 100, 1)
else:
proposal['total_votes'] = 0
proposal['approve_percentage'] = 0
proposal['oppose_percentage'] = 0
proposal['abstain_percentage'] = 0
# Extract proposer from the page
soup = BeautifulSoup(html, 'html.parser')
content = soup.select_one('#mw-content-text')
if content:
# Look for table rows with "Proposed by:" in the header cell
for row in content.select('tr'):
cells = row.select('th, td')
if len(cells) >= 2:
header_text = cells[0].get_text().strip().lower()
if "proposed by" in header_text:
user_link = cells[1].select_one('a[href*="/wiki/User:"]')
if user_link:
href = user_link.get('href', '')
title = user_link.get('title', '')
# Try to get username from title attribute first
if title and title.startswith('User:'):
proposal['proposer'] = title[5:] # Remove 'User:' prefix
# Otherwise try to extract from href
elif href:
href_match = re.search(r'/wiki/User:([^/]+)', href)
if href_match:
proposal['proposer'] = href_match.group(1)
# If still no proposer, use the link text
if 'proposer' not in proposal and user_link.get_text():
proposal['proposer'] = user_link.get_text().strip()
# Add a delay to avoid overloading the server
time.sleep(1)
proposals.append(proposal)
logger.info(f"Found {len(proposals)} voting proposals")
return proposals
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching voting proposals: {e}")
return []
def fetch_recent_proposals():
"""
Fetch recently modified proposals from the OSM Wiki
"""
logger.info(f"Fetching recent changes from {RECENT_CHANGES_URL}")
try:
response = requests.get(RECENT_CHANGES_URL)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
proposals = []
# Find all change list lines
change_lines = soup.select('.mw-changeslist .mw-changeslist-line')
for line in change_lines:
# Get the page title
title_element = line.select_one('.mw-changeslist-title')
if not title_element:
continue
page_title = title_element.text.strip()
page_url = title_element.get('href', '')
if not page_url.startswith('http'):
page_url = f"https://wiki.openstreetmap.org{page_url}"
# Get the timestamp
timestamp_element = line.select_one('.mw-changeslist-date')
timestamp = timestamp_element.text.strip() if timestamp_element else ""
# Get the user who made the change
user_element = line.select_one('.mw-userlink')
user = user_element.text.strip() if user_element else "Unknown"
# Skip if it's not a proposal page
if not page_title.startswith('Proposal:'):
continue
proposals.append({
'title': page_title,
'url': page_url,
'last_modified': timestamp,
'modified_by': user,
'type': 'recent'
})
# Limit to the 10 most recent proposals
proposals = proposals[:10]
logger.info(f"Found {len(proposals)} recently modified proposals")
return proposals
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching recent proposals: {e}")
return []
def save_proposals(voting_proposals, recent_proposals):
"""
Save the proposals to a JSON file
"""
data = {
'last_updated': datetime.now().isoformat(),
'voting_proposals': voting_proposals,
'recent_proposals': recent_proposals
}
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
logger.info(f"Saved {len(voting_proposals)} voting proposals and {len(recent_proposals)} recent proposals to {OUTPUT_FILE}")
return OUTPUT_FILE
def main():
parser = argparse.ArgumentParser(description='Fetch OSM Wiki proposals')
parser.add_argument('--force', action='store_true', help='Force update even if cache is fresh')
parser.add_argument('--dry-run', action='store_true', help='Print results without saving to file')
args = parser.parse_args()
# Check if we should update the cache
if args.force or should_update_cache() or args.dry_run:
voting_proposals = fetch_voting_proposals()
recent_proposals = fetch_recent_proposals()
if args.dry_run:
logger.info(f"Found {len(voting_proposals)} voting proposals:")
for proposal in voting_proposals:
logger.info(f"- {proposal['title']}")
logger.info(f"Found {len(recent_proposals)} recent proposals:")
for proposal in recent_proposals:
logger.info(f"- {proposal['title']} (modified by {proposal['modified_by']} on {proposal['last_modified']})")
else:
output_file = save_proposals(voting_proposals, recent_proposals)
logger.info(f"Results saved to {output_file}")
else:
logger.info("Using cached proposals data")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,635 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
fetch_recent_changes.py
This script fetches recent changes from the OpenStreetMap wiki for the French namespace
and stores the URLs of these pages. It specifically targets the recent changes page:
https://wiki.openstreetmap.org/w/index.php?hidebots=1&hidepreviousrevisions=1&hidecategorization=1&hideWikibase=1&hidelog=1&hidenewuserlog=1&namespace=202&limit=10000&days=365&enhanced=1&title=Special:RecentChanges&urlversion=2
Usage:
python fetch_recent_changes.py [--dry-run] [--force]
Options:
--dry-run Run the script without saving the results to a file
--force Force update even if the cache is still fresh (less than 1 hour old)
Output:
- recent_changes.json: JSON file with information about recent changes in the French namespace
- Log messages about the scraping process and results
"""
import json
import argparse
import logging
import os
import re
import shutil
from datetime import datetime, timedelta
import requests
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
# Use the directory of this script to determine the output file path
SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
OUTPUT_FILE = os.path.join(SCRIPT_DIR, "recent_changes.json")
UNAVAILABLE_PAGES_FILE = os.path.join(SCRIPT_DIR, "pages_unavailable_in_french.json")
CREATED_PAGES_FILE = os.path.join(SCRIPT_DIR, "newly_created_french_pages.json")
RECENT_CHANGES_URL = "https://wiki.openstreetmap.org/w/index.php?hidebots=1&hidepreviousrevisions=1&hidecategorization=1&hideWikibase=1&hidelog=1&hidenewuserlog=1&namespace=202&limit=500&days=30&enhanced=1&title=Special:RecentChanges&urlversion=2"
WIKI_BASE_URL = "https://wiki.openstreetmap.org"
CACHE_DURATION = timedelta(hours=1) # Cache duration of 1 hour
def is_cache_fresh():
"""
Check if the cache file exists and is less than CACHE_DURATION old
Returns:
bool: True if cache is fresh, False otherwise
"""
if not os.path.exists(OUTPUT_FILE):
return False
try:
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
last_updated = datetime.fromisoformat(data.get('last_updated', '2000-01-01T00:00:00'))
now = datetime.now()
return (now - last_updated) < CACHE_DURATION
except (IOError, json.JSONDecodeError, ValueError) as e:
logger.error(f"Error checking cache freshness: {e}")
return False
def get_page_content(url):
"""
Get the HTML content of a page
Args:
url (str): URL to fetch
Returns:
str: HTML content of the page or None if request failed
"""
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def extract_recent_changes(html_content):
"""
Extract recent changes from the wiki page HTML
Args:
html_content (str): HTML content of the recent changes page
Returns:
list: List of recent change dictionaries
"""
if not html_content:
return []
soup = BeautifulSoup(html_content, 'html.parser')
recent_changes = []
# Find the main changeslist container
# According to the issue description, we should look for .mw-changeslist
changes_list = soup.find('div', class_='mw-changeslist')
if not changes_list:
# If still not found, look for the content area
content_div = soup.find('div', id='mw-content-text')
if content_div:
# Try to find the changeslist div
changes_list = content_div.find('div', class_='mw-changeslist')
if not changes_list:
# Log the HTML structure to help debug
logger.warning("Could not find recent changes list. HTML structure:")
body = soup.find('body')
if body:
content_area = body.find('div', id='content')
if content_area:
logger.warning(f"Content area classes: {content_area.get('class', [])}")
main_content = content_area.find('div', id='mw-content-text')
if main_content:
logger.warning(f"Main content first child: {main_content.find().name if main_content.find() else 'None'}")
return []
logger.info(f"Found changes list with tag: {changes_list.name}, classes: {changes_list.get('class', [])}")
# Process each change item - based on the actual HTML structure
# According to the debug output, the changes are in tr elements
change_items = changes_list.find_all('tr')
# If no tr elements found directly, look for tables with class mw-changeslist-line
if not change_items:
tables = changes_list.find_all('table', class_='mw-changeslist-line')
for table in tables:
trs = table.find_all('tr')
change_items.extend(trs)
logger.info(f"Found {len(change_items)} change items")
for item in change_items:
# Extract the page link from the mw-changeslist-title class
page_link = item.find('a', class_='mw-changeslist-title')
if not page_link:
# If not found with the specific class, try to find any link that might be the page link
inner_td = item.find('td', class_='mw-changeslist-line-inner')
if inner_td:
links = inner_td.find_all('a')
for link in links:
href = link.get('href', '')
if '/wiki/' in href and 'action=history' not in href and 'diff=' not in href:
page_link = link
break
if not page_link:
# Skip items without a page link (might be headers or other elements)
continue
page_name = page_link.get_text().strip()
page_url = page_link.get('href')
if not page_url.startswith('http'):
page_url = WIKI_BASE_URL + page_url
# Extract the timestamp from the mw-enhanced-rc class
timestamp_td = item.find('td', class_='mw-enhanced-rc')
timestamp = timestamp_td.get_text().strip() if timestamp_td else "Unknown"
# Extract the user from the mw-userlink class
user_link = item.find('a', class_='mw-userlink')
user = user_link.get_text().strip() if user_link else "Unknown"
# Extract the user profile URL
user_url = ""
if user_link and user_link.get('href'):
user_url = user_link.get('href')
if not user_url.startswith('http'):
user_url = WIKI_BASE_URL + user_url
# Extract the diff link
diff_url = ""
diff_link = item.find('a', class_='mw-changeslist-diff') or item.find('a', string='diff')
if diff_link and diff_link.get('href'):
diff_url = diff_link.get('href')
if not diff_url.startswith('http'):
diff_url = WIKI_BASE_URL + diff_url
# Extract the comment from the comment class
comment_span = item.find('span', class_='comment')
comment = comment_span.get_text().strip() if comment_span else ""
# Extract the change size from the mw-diff-bytes class
size_span = item.find('span', class_='mw-diff-bytes')
if size_span:
change_size = size_span.get_text().strip()
else:
# If not found, try to extract from the text
change_size = "0"
text = item.get_text()
size_matches = re.findall(r'\(\s*([+-]?\d+)\s*\)', text)
if size_matches:
change_size = size_matches[0]
# Extract text differences if diff_url is available
added_text = ""
removed_text = ""
if diff_url:
try:
# Fetch the diff page
diff_html = get_page_content(diff_url)
if diff_html:
diff_soup = BeautifulSoup(diff_html, 'html.parser')
# Find added text (ins elements)
added_elements = diff_soup.find_all('ins', class_='diffchange')
if added_elements:
added_text = ' '.join([el.get_text().strip() for el in added_elements])
# Find removed text (del elements)
removed_elements = diff_soup.find_all('del', class_='diffchange')
if removed_elements:
removed_text = ' '.join([el.get_text().strip() for el in removed_elements])
except Exception as e:
logger.error(f"Error fetching diff page {diff_url}: {e}")
recent_changes.append({
"page_name": page_name,
"page_url": page_url,
"timestamp": timestamp,
"user": user,
"user_url": user_url,
"comment": comment,
"change_size": change_size,
"diff_url": diff_url,
"added_text": added_text,
"removed_text": removed_text
})
logger.debug(f"Extracted change: {page_name} by {user}")
logger.info(f"Extracted {len(recent_changes)} recent changes")
return recent_changes
def save_results(recent_changes, dry_run=False):
"""
Save the results to a JSON file
Args:
recent_changes (list): List of recent change dictionaries
dry_run (bool): If True, don't actually save to file
Returns:
bool: True if saving was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have saved results to file")
logger.info(f"Recent changes: {len(recent_changes)}")
for change in recent_changes[:5]: # Show only first 5 for brevity
logger.info(f" - {change['page_name']}: {change['page_url']} ({change['timestamp']})")
if len(recent_changes) > 5:
logger.info(f" ... and {len(recent_changes) - 5} more")
return True
# Log some details about the recent changes
logger.info(f"Preparing to save {len(recent_changes)} recent changes")
if recent_changes:
logger.info(f"First change: {recent_changes[0]['page_name']} by {recent_changes[0]['user']}")
# Prepare the data structure
data = {
"last_updated": datetime.now().isoformat(),
"recent_changes": recent_changes
}
# Get the file's last modified time before saving
before_mtime = None
if os.path.exists(OUTPUT_FILE):
before_mtime = os.path.getmtime(OUTPUT_FILE)
logger.info(f"File {OUTPUT_FILE} exists, last modified at {datetime.fromtimestamp(before_mtime)}")
try:
# Print the JSON data that we're trying to save
json_data = json.dumps(data, indent=2, ensure_ascii=False)
logger.info(f"JSON data to save (first 500 chars): {json_data[:500]}...")
# Save the data to a temporary file first
temp_file = OUTPUT_FILE + ".tmp"
logger.info(f"Writing data to temporary file {temp_file}")
with open(temp_file, 'w', encoding='utf-8') as f:
f.write(json_data)
# Check if the temporary file was created and has content
if os.path.exists(temp_file):
temp_size = os.path.getsize(temp_file)
logger.info(f"Temporary file {temp_file} created, size: {temp_size} bytes")
# Read the content of the temporary file to verify
with open(temp_file, 'r', encoding='utf-8') as f:
temp_content = f.read(500) # Read first 500 chars
logger.info(f"Temporary file content (first 500 chars): {temp_content}...")
# Move the temporary file to the final location
logger.info(f"Moving temporary file to {OUTPUT_FILE}")
import shutil
shutil.move(temp_file, OUTPUT_FILE)
else:
logger.error(f"Failed to create temporary file {temp_file}")
# Check if the file was actually updated
if os.path.exists(OUTPUT_FILE):
after_mtime = os.path.getmtime(OUTPUT_FILE)
file_size = os.path.getsize(OUTPUT_FILE)
logger.info(f"File {OUTPUT_FILE} exists, size: {file_size} bytes, mtime: {datetime.fromtimestamp(after_mtime)}")
# Read the content of the file to verify
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
file_content = f.read(500) # Read first 500 chars
logger.info(f"File content (first 500 chars): {file_content}...")
if before_mtime and after_mtime <= before_mtime:
logger.warning(f"File {OUTPUT_FILE} was not updated (mtime did not change)")
else:
logger.error(f"File {OUTPUT_FILE} does not exist after saving")
# Copy the file to the public directory
public_file = os.path.join(os.path.dirname(os.path.dirname(OUTPUT_FILE)), 'public', os.path.basename(OUTPUT_FILE))
logger.info(f"Copying {OUTPUT_FILE} to {public_file}")
shutil.copy2(OUTPUT_FILE, public_file)
# Check if the public file was created
if os.path.exists(public_file):
public_size = os.path.getsize(public_file)
logger.info(f"Public file {public_file} created, size: {public_size} bytes")
else:
logger.error(f"Failed to create public file {public_file}")
logger.info(f"Successfully saved {len(recent_changes)} recent changes to {OUTPUT_FILE}")
return True
except IOError as e:
logger.error(f"Error saving results to {OUTPUT_FILE}: {e}")
return False
def load_unavailable_pages():
"""
Load the list of pages unavailable in French
Returns:
tuple: (all_pages, grouped_pages, last_updated)
"""
if not os.path.exists(UNAVAILABLE_PAGES_FILE):
logger.warning(f"Unavailable pages file {UNAVAILABLE_PAGES_FILE} does not exist")
return [], {}, None
try:
with open(UNAVAILABLE_PAGES_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
all_pages = data.get('all_pages', [])
grouped_pages = data.get('grouped_pages', {})
last_updated = data.get('last_updated')
return all_pages, grouped_pages, last_updated
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading unavailable pages file: {e}")
return [], {}, None
def load_created_pages():
"""
Load the list of newly created French pages
Returns:
tuple: (created_pages, last_updated)
"""
if not os.path.exists(CREATED_PAGES_FILE):
logger.info(f"Created pages file {CREATED_PAGES_FILE} does not exist, will create it")
return [], None
try:
with open(CREATED_PAGES_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
created_pages = data.get('created_pages', [])
last_updated = data.get('last_updated')
return created_pages, last_updated
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading created pages file: {e}")
return [], None
def save_created_pages(created_pages, dry_run=False):
"""
Save the list of newly created French pages
Args:
created_pages (list): List of newly created French pages
dry_run (bool): If True, don't actually save to file
Returns:
bool: True if saving was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have saved created pages to file")
return True
data = {
"last_updated": datetime.now().isoformat(),
"created_pages": created_pages
}
try:
with open(CREATED_PAGES_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Successfully saved {len(created_pages)} created pages to {CREATED_PAGES_FILE}")
# Copy the file to the public directory
public_file = os.path.join(os.path.dirname(os.path.dirname(CREATED_PAGES_FILE)), 'public', os.path.basename(CREATED_PAGES_FILE))
logger.info(f"Copying {CREATED_PAGES_FILE} to {public_file}")
shutil.copy2(CREATED_PAGES_FILE, public_file)
return True
except IOError as e:
logger.error(f"Error saving created pages to {CREATED_PAGES_FILE}: {e}")
return False
def save_unavailable_pages(all_pages, grouped_pages, dry_run=False):
"""
Save the updated list of pages unavailable in French
Args:
all_pages (list): List of all unavailable pages
grouped_pages (dict): Dictionary of pages grouped by language prefix
dry_run (bool): If True, don't actually save to file
Returns:
bool: True if saving was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have saved updated unavailable pages to file")
return True
data = {
"last_updated": datetime.now().isoformat(),
"all_pages": all_pages,
"grouped_pages": grouped_pages
}
try:
with open(UNAVAILABLE_PAGES_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Successfully saved {len(all_pages)} unavailable pages to {UNAVAILABLE_PAGES_FILE}")
# Copy the file to the public directory
public_file = os.path.join(os.path.dirname(os.path.dirname(UNAVAILABLE_PAGES_FILE)), 'public', os.path.basename(UNAVAILABLE_PAGES_FILE))
logger.info(f"Copying {UNAVAILABLE_PAGES_FILE} to {public_file}")
shutil.copy2(UNAVAILABLE_PAGES_FILE, public_file)
return True
except IOError as e:
logger.error(f"Error saving unavailable pages to {UNAVAILABLE_PAGES_FILE}: {e}")
return False
def check_for_newly_created_pages(recent_changes, all_pages, grouped_pages):
"""
Check if any of the recent changes are newly created French pages that were previously in the list of pages unavailable in French
Args:
recent_changes (list): List of recent change dictionaries
all_pages (list): List of all unavailable pages
grouped_pages (dict): Dictionary of pages grouped by language prefix
Returns:
tuple: (updated_all_pages, updated_grouped_pages, newly_created_pages)
"""
newly_created_pages = []
updated_all_pages = all_pages.copy()
updated_grouped_pages = {k: v.copy() for k, v in grouped_pages.items()}
# Check each recent change
for change in recent_changes:
page_name = change['page_name']
page_url = change['page_url']
comment = change['comment'].lower()
# Check if this is a new page creation
is_new_page = "page created" in comment or "nouvelle page" in comment
if is_new_page and page_name.startswith("FR:"):
logger.info(f"Found newly created French page: {page_name}")
# Check if this page was previously in the list of unavailable pages
# We need to check if the English version of this page is in the list
en_page_name = page_name.replace("FR:", "")
# Find the English page in the list of unavailable pages
found_en_page = None
for page in all_pages:
if page['title'] == en_page_name or (page['title'].startswith("En:") and page['title'][3:] == en_page_name):
found_en_page = page
break
if found_en_page:
logger.info(f"Found corresponding English page in unavailable pages list: {found_en_page['title']}")
# Remove the English page from the list of unavailable pages
updated_all_pages.remove(found_en_page)
# Remove the English page from the grouped pages
lang_prefix = found_en_page['language_prefix']
if lang_prefix in updated_grouped_pages and found_en_page in updated_grouped_pages[lang_prefix]:
updated_grouped_pages[lang_prefix].remove(found_en_page)
# If the group is now empty, remove it
if not updated_grouped_pages[lang_prefix]:
del updated_grouped_pages[lang_prefix]
# Add the newly created page to the list
newly_created_pages.append({
"title": page_name,
"url": page_url,
"en_title": found_en_page['title'],
"en_url": found_en_page['url'],
"created_at": change['timestamp'],
"created_by": change['user'],
"comment": change['comment']
})
return updated_all_pages, updated_grouped_pages, newly_created_pages
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Fetch recent changes from the OSM wiki French namespace")
parser.add_argument("--dry-run", action="store_true", help="Run without saving results to file")
parser.add_argument("--force", action="store_true", help="Force update even if cache is fresh")
parser.add_argument("--debug", action="store_true", help="Save HTML content to a file for debugging")
args = parser.parse_args()
logger.info("Starting fetch_recent_changes.py")
# Check if cache is fresh
if is_cache_fresh() and not args.force:
logger.info(f"Cache is still fresh (less than {CACHE_DURATION.total_seconds()/3600} hours old)")
logger.info(f"Use --force to update anyway")
return
# Get the recent changes page content
html_content = get_page_content(RECENT_CHANGES_URL)
if not html_content:
logger.error("Failed to get recent changes page content")
return
# Save HTML content to a file for debugging
if args.debug:
debug_file = "recent_changes_debug.html"
try:
with open(debug_file, 'w', encoding='utf-8') as f:
f.write(html_content)
logger.info(f"Saved HTML content to {debug_file} for debugging")
except IOError as e:
logger.error(f"Error saving HTML content to {debug_file}: {e}")
# Parse the HTML to find the structure
soup = BeautifulSoup(html_content, 'html.parser')
# Find the main content area
content_div = soup.find('div', id='mw-content-text')
if content_div:
logger.info(f"Found content div with id 'mw-content-text'")
# Look for elements with mw-changeslist class
changeslist_elements = content_div.find_all(class_='mw-changeslist')
logger.info(f"Found {len(changeslist_elements)} elements with class 'mw-changeslist'")
for i, element in enumerate(changeslist_elements):
logger.info(f"Element {i+1} tag: {element.name}, classes: {element.get('class', [])}")
# Look for table rows or other elements that might contain changes
rows = element.find_all('tr')
divs = element.find_all('div', class_='mw-changeslist-line')
lis = element.find_all('li')
logger.info(f" - Contains {len(rows)} tr elements")
logger.info(f" - Contains {len(divs)} div.mw-changeslist-line elements")
logger.info(f" - Contains {len(lis)} li elements")
# Check direct children
children = list(element.children)
logger.info(f" - Has {len(children)} direct children")
if children:
child_types = {}
for child in children:
if hasattr(child, 'name') and child.name:
child_type = child.name
child_types[child_type] = child_types.get(child_type, 0) + 1
logger.info(f" - Direct children types: {child_types}")
# Extract recent changes
recent_changes = extract_recent_changes(html_content)
if not recent_changes:
logger.warning("No recent changes found")
# Save results
success = save_results(recent_changes, args.dry_run)
# Check for newly created French pages
logger.info("Checking for newly created French pages...")
all_pages, grouped_pages, last_updated = load_unavailable_pages()
created_pages, created_last_updated = load_created_pages()
if all_pages and grouped_pages:
# Check for newly created pages
updated_all_pages, updated_grouped_pages, newly_created = check_for_newly_created_pages(recent_changes, all_pages, grouped_pages)
# If we found newly created pages, update both files
if newly_created:
logger.info(f"Found {len(newly_created)} newly created French pages")
# Add the newly created pages to the existing list
created_pages.extend(newly_created)
# Save the updated files
save_unavailable_pages(updated_all_pages, updated_grouped_pages, args.dry_run)
save_created_pages(created_pages, args.dry_run)
else:
logger.info("No newly created French pages found")
else:
logger.warning("Could not check for newly created French pages: unavailable pages file not found or empty")
if success:
logger.info("Script completed successfully")
else:
logger.error("Script completed with errors")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,293 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
find_pages_unavailable_in_english.py
This script scrapes the OpenStreetMap wiki category "Pages unavailable in English"
to identify French pages that need translation to English. It handles pagination to get all pages,
filters for pages with "FR:" in the title, and saves them to a JSON file.
Usage:
python find_pages_unavailable_in_english.py [--dry-run] [--force]
Options:
--dry-run Run the script without saving the results to a file
--force Force update even if the cache is still fresh (less than 1 hour old)
Output:
- pages_unavailable_in_english.json: JSON file with French pages that need translation to English
- Log messages about the scraping process and results
"""
import json
import argparse
import logging
import os
import re
import random
import hashlib
import csv
from datetime import datetime, timedelta
import requests
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTPUT_FILE = "pages_unavailable_in_english.json"
WIKI_PAGES_CSV = "wiki_pages.csv"
BASE_URL = "https://wiki.openstreetmap.org/wiki/Category:Pages_unavailable_in_English"
WIKI_BASE_URL = "https://wiki.openstreetmap.org"
CACHE_DURATION = timedelta(hours=1) # Cache duration of 1 hour
def read_wiki_pages_csv():
"""
Read the wiki_pages.csv file and create a mapping of URLs to description_img_url values
Returns:
dict: Dictionary mapping URLs to description_img_url values
"""
url_to_img_map = {}
try:
with open(WIKI_PAGES_CSV, 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
if 'url' in row and 'description_img_url' in row and row['description_img_url']:
url_to_img_map[row['url']] = row['description_img_url']
logger.info(f"Read {len(url_to_img_map)} image URLs from {WIKI_PAGES_CSV}")
return url_to_img_map
except (IOError, csv.Error) as e:
logger.error(f"Error reading {WIKI_PAGES_CSV}: {e}")
return {}
def is_cache_fresh():
"""
Check if the cache file exists and is less than CACHE_DURATION old
Returns:
bool: True if cache is fresh, False otherwise
"""
if not os.path.exists(OUTPUT_FILE):
return False
try:
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
last_updated = datetime.fromisoformat(data.get('last_updated', '2000-01-01T00:00:00'))
now = datetime.now()
return (now - last_updated) < CACHE_DURATION
except (IOError, json.JSONDecodeError, ValueError) as e:
logger.error(f"Error checking cache freshness: {e}")
return False
def get_page_content(url):
"""
Get the HTML content of a page
Args:
url (str): URL to fetch
Returns:
str: HTML content of the page or None if request failed
"""
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def extract_pages_from_category(html_content, current_url):
"""
Extract pages from the category page HTML, filtering for pages with "FR:" in the title
Args:
html_content (str): HTML content of the category page
current_url (str): URL of the current page for resolving relative links
Returns:
tuple: (list of page dictionaries, next page URL or None)
"""
if not html_content:
return [], None
soup = BeautifulSoup(html_content, 'html.parser')
pages = []
# Find the category content
category_content = soup.find('div', class_='mw-category-generated')
if not category_content:
logger.warning("Could not find category content")
return [], None
# Extract pages
for link in category_content.find_all('a'):
title = link.get_text()
url = WIKI_BASE_URL + link.get('href')
# Filter for pages with "FR:" in the title
if "FR:" in title:
# Extract language prefix (should be "FR")
language_prefix = "FR"
# Calculate outdatedness score
outdatedness_score = calculate_outdatedness_score(title)
pages.append({
"title": title,
"url": url,
"language_prefix": language_prefix,
"priority": 1, # All French pages have the same priority
"outdatedness_score": outdatedness_score
})
# Find next page link
next_page_url = None
pagination = soup.find('div', class_='mw-category-generated')
if pagination:
next_link = pagination.find('a', string='next page')
if next_link:
next_page_url = WIKI_BASE_URL + next_link.get('href')
return pages, next_page_url
def scrape_all_pages():
"""
Scrape all pages from the category, handling pagination
Returns:
list: List of page dictionaries
"""
all_pages = []
current_url = BASE_URL
page_num = 1
while current_url:
logger.info(f"Scraping page {page_num}: {current_url}")
html_content = get_page_content(current_url)
if not html_content:
logger.error(f"Failed to get content for page {page_num}")
break
pages, next_url = extract_pages_from_category(html_content, current_url)
logger.info(f"Found {len(pages)} French pages on page {page_num}")
all_pages.extend(pages)
current_url = next_url
page_num += 1
if not next_url:
logger.info("No more pages to scrape")
logger.info(f"Total French pages scraped: {len(all_pages)}")
return all_pages
def calculate_outdatedness_score(title):
"""
Calculate an outdatedness score for a page based on its title
Args:
title (str): The page title
Returns:
int: An outdatedness score between 1 and 100
"""
# Use a hash of the title to generate a consistent but varied score
hash_value = int(hashlib.md5(title.encode('utf-8')).hexdigest(), 16)
# Generate a score between 1 and 100
base_score = (hash_value % 100) + 1
return base_score
def save_results(pages, dry_run=False):
"""
Save the results to a JSON file
Args:
pages (list): List of page dictionaries
dry_run (bool): If True, don't actually save to file
Returns:
bool: True if saving was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have saved results to file")
return True
# Prepare the data structure
data = {
"last_updated": datetime.now().isoformat(),
"pages": pages,
"count": len(pages)
}
try:
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Successfully saved {len(pages)} pages to {OUTPUT_FILE}")
# Copy the file to the public directory for web access
public_dir = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), 'public')
if os.path.exists(public_dir):
public_file = os.path.join(public_dir, OUTPUT_FILE)
with open(public_file, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Copied {OUTPUT_FILE} to public directory")
return True
except IOError as e:
logger.error(f"Error saving results to {OUTPUT_FILE}: {e}")
return False
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Scrape French pages unavailable in English from OSM wiki")
parser.add_argument("--dry-run", action="store_true", help="Run without saving results to file")
parser.add_argument("--force", action="store_true", help="Force update even if cache is fresh")
args = parser.parse_args()
logger.info("Starting find_pages_unavailable_in_english.py")
# Check if cache is fresh
if is_cache_fresh() and not args.force:
logger.info(f"Cache is still fresh (less than {CACHE_DURATION.total_seconds()/3600} hours old)")
logger.info(f"Use --force to update anyway")
return
# Read image URLs from wiki_pages.csv
url_to_img_map = read_wiki_pages_csv()
# Scrape pages
pages = scrape_all_pages()
if not pages:
logger.error("No pages found")
return
# Add description_img_url to pages
for page in pages:
if page["url"] in url_to_img_map:
page["description_img_url"] = url_to_img_map[page["url"]]
# Save results
success = save_results(pages, args.dry_run)
if success:
logger.info("Script completed successfully")
else:
logger.error("Script completed with errors")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,329 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
find_pages_unavailable_in_french.py
This script scrapes the OpenStreetMap wiki category "Pages unavailable in French"
to identify pages that need translation. It handles pagination to get all pages,
groups them by language prefix, and prioritizes English pages starting with "En:".
Usage:
python find_pages_unavailable_in_french.py [--dry-run] [--force]
Options:
--dry-run Run the script without saving the results to a file
--force Force update even if the cache is still fresh (less than 1 hour old)
Output:
- pages_unavailable_in_french.json: JSON file with pages that need translation
- Log messages about the scraping process and results
"""
import json
import argparse
import logging
import os
import re
import random
import hashlib
import csv
from datetime import datetime, timedelta
import requests
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTPUT_FILE = "pages_unavailable_in_french.json"
WIKI_PAGES_CSV = "wiki_pages.csv"
BASE_URL = "https://wiki.openstreetmap.org/wiki/Category:Pages_unavailable_in_French"
WIKI_BASE_URL = "https://wiki.openstreetmap.org"
CACHE_DURATION = timedelta(hours=1) # Cache duration of 1 hour
def read_wiki_pages_csv():
"""
Read the wiki_pages.csv file and create a mapping of URLs to description_img_url values
Returns:
dict: Dictionary mapping URLs to description_img_url values
"""
url_to_img_map = {}
try:
with open(WIKI_PAGES_CSV, 'r', newline='', encoding='utf-8') as f:
reader = csv.DictReader(f)
for row in reader:
if 'url' in row and 'description_img_url' in row and row['description_img_url']:
url_to_img_map[row['url']] = row['description_img_url']
logger.info(f"Read {len(url_to_img_map)} image URLs from {WIKI_PAGES_CSV}")
return url_to_img_map
except (IOError, csv.Error) as e:
logger.error(f"Error reading {WIKI_PAGES_CSV}: {e}")
return {}
def is_cache_fresh():
"""
Check if the cache file exists and is less than CACHE_DURATION old
Returns:
bool: True if cache is fresh, False otherwise
"""
if not os.path.exists(OUTPUT_FILE):
return False
try:
with open(OUTPUT_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
last_updated = datetime.fromisoformat(data.get('last_updated', '2000-01-01T00:00:00'))
now = datetime.now()
return (now - last_updated) < CACHE_DURATION
except (IOError, json.JSONDecodeError, ValueError) as e:
logger.error(f"Error checking cache freshness: {e}")
return False
def get_page_content(url):
"""
Get the HTML content of a page
Args:
url (str): URL to fetch
Returns:
str: HTML content of the page or None if request failed
"""
try:
response = requests.get(url)
response.raise_for_status()
return response.text
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching {url}: {e}")
return None
def extract_pages_from_category(html_content, current_url):
"""
Extract pages from the category page HTML
Args:
html_content (str): HTML content of the category page
current_url (str): URL of the current page for resolving relative links
Returns:
tuple: (list of page dictionaries, next page URL or None)
"""
if not html_content:
return [], None
soup = BeautifulSoup(html_content, 'html.parser')
pages = []
# Find the category content
category_content = soup.find('div', class_='mw-category-generated')
if not category_content:
logger.warning("Could not find category content")
return [], None
# Extract pages
for link in category_content.find_all('a'):
title = link.get_text()
url = WIKI_BASE_URL + link.get('href')
# Skip pages with "FR:User:" or "FR:Réunions"
if "FR:User:" in title or "FR:Réunions" in title:
logger.info(f"Skipping excluded page: {title}")
continue
# Extract language prefix (e.g., "En:", "De:", etc.)
language_prefix = "Other"
match = re.match(r'^([A-Za-z]{2}):', title)
if match:
language_prefix = match.group(1)
# Check if it's an English page
is_english = language_prefix.lower() == "en"
# Set priority (English pages have higher priority)
priority = 1 if is_english else 0
# Calculate outdatedness score
outdatedness_score = calculate_outdatedness_score(title, is_english)
pages.append({
"title": title,
"url": url,
"language_prefix": language_prefix,
"is_english": is_english,
"priority": priority,
"outdatedness_score": outdatedness_score
})
# Find next page link
next_page_url = None
pagination = soup.find('div', class_='mw-category-generated')
if pagination:
next_link = pagination.find('a', string='next page')
if next_link:
next_page_url = WIKI_BASE_URL + next_link.get('href')
return pages, next_page_url
def scrape_all_pages():
"""
Scrape all pages from the category, handling pagination
Returns:
list: List of page dictionaries
"""
all_pages = []
current_url = BASE_URL
page_num = 1
while current_url:
logger.info(f"Scraping page {page_num}: {current_url}")
html_content = get_page_content(current_url)
if not html_content:
logger.error(f"Failed to get content for page {page_num}")
break
pages, next_url = extract_pages_from_category(html_content, current_url)
logger.info(f"Found {len(pages)} pages on page {page_num}")
all_pages.extend(pages)
current_url = next_url
page_num += 1
if not next_url:
logger.info("No more pages to scrape")
logger.info(f"Total pages scraped: {len(all_pages)}")
return all_pages
def calculate_outdatedness_score(title, is_english):
"""
Calculate an outdatedness score for a page based on its title
Args:
title (str): The page title
is_english (bool): Whether the page is in English
Returns:
int: An outdatedness score between 1 and 100
"""
# Use a hash of the title to generate a consistent but varied score
hash_value = int(hashlib.md5(title.encode('utf-8')).hexdigest(), 16)
# Generate a score between 1 and 100
base_score = (hash_value % 100) + 1
# English pages get a higher base score
if is_english:
base_score = min(base_score + 20, 100)
return base_score
def group_pages_by_language(pages):
"""
Group pages by language prefix
Args:
pages (list): List of page dictionaries
Returns:
dict: Dictionary with language prefixes as keys and lists of pages as values
"""
grouped = {}
for page in pages:
prefix = page["language_prefix"]
if prefix not in grouped:
grouped[prefix] = []
grouped[prefix].append(page)
# Sort each group by priority (English pages first) and then by title
for prefix in grouped:
grouped[prefix].sort(key=lambda x: (-x["priority"], x["title"]))
return grouped
def save_results(pages, dry_run=False):
"""
Save the results to a JSON file
Args:
pages (list): List of page dictionaries
dry_run (bool): If True, don't actually save to file
Returns:
bool: True if saving was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have saved results to file")
return True
# Group pages by language prefix
grouped_pages = group_pages_by_language(pages)
# Prepare the data structure
data = {
"last_updated": datetime.now().isoformat(),
"grouped_pages": grouped_pages,
"all_pages": pages
}
try:
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Successfully saved {len(pages)} pages to {OUTPUT_FILE}")
return True
except IOError as e:
logger.error(f"Error saving results to {OUTPUT_FILE}: {e}")
return False
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Scrape pages unavailable in French from OSM wiki")
parser.add_argument("--dry-run", action="store_true", help="Run without saving results to file")
parser.add_argument("--force", action="store_true", help="Force update even if cache is fresh")
args = parser.parse_args()
logger.info("Starting find_pages_unavailable_in_french.py")
# Check if cache is fresh
if is_cache_fresh() and not args.force:
logger.info(f"Cache is still fresh (less than {CACHE_DURATION.total_seconds()/3600} hours old)")
logger.info(f"Use --force to update anyway")
return
# Read image URLs from wiki_pages.csv
url_to_img_map = read_wiki_pages_csv()
# Scrape pages
pages = scrape_all_pages()
if not pages:
logger.error("No pages found")
return
# Add description_img_url to pages
for page in pages:
if page["url"] in url_to_img_map:
page["description_img_url"] = url_to_img_map[page["url"]]
# Save results
success = save_results(pages, args.dry_run)
if success:
logger.info("Script completed successfully")
else:
logger.error("Script completed with errors")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,212 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
find_untranslated_french_pages.py
This script scrapes the OSM wiki to find French pages that don't have translations
in other languages. It caches the results and only performs the scraping
at most once per hour.
Usage:
python find_untranslated_french_pages.py [--force] [--dry-run]
Options:
--force Force update even if cache is fresh
--dry-run Print results without saving to file
Output:
- untranslated_french_pages.json: JSON file containing information about French pages without translations
"""
import requests
from bs4 import BeautifulSoup
import json
import logging
import argparse
import os
from datetime import datetime, timedelta
import re
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTPUT_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'untranslated_french_pages.json')
CACHE_TIMEOUT = 1 # hours
WIKI_BASE_URL = "https://wiki.openstreetmap.org"
FRENCH_PAGES_URL = "https://wiki.openstreetmap.org/wiki/Special:AllPages?from=&to=&namespace=202&hideredirects=1&prefix=FR:"
def should_update_cache():
"""
Check if the cache file exists and if it's older than the cache timeout
Returns:
bool: True if cache should be updated, False otherwise
"""
if not os.path.exists(OUTPUT_FILE):
logger.info("Cache file doesn't exist, creating it")
return True
# Check file modification time
file_mtime = datetime.fromtimestamp(os.path.getmtime(OUTPUT_FILE))
now = datetime.now()
# If file is older than cache timeout, update it
if now - file_mtime > timedelta(hours=CACHE_TIMEOUT):
logger.info(f"Cache is older than {CACHE_TIMEOUT} hour(s), updating")
return True
logger.info(f"Cache is still fresh (less than {CACHE_TIMEOUT} hour(s) old)")
return False
def fetch_french_pages():
"""
Fetch all French pages from the OSM wiki
Returns:
list: List of dictionaries containing French page information
"""
logger.info(f"Fetching French pages from {FRENCH_PAGES_URL}")
french_pages = []
next_page_url = FRENCH_PAGES_URL
while next_page_url:
try:
response = requests.get(next_page_url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Find all links in the mw-allpages-body section
links_container = soup.select_one('.mw-allpages-body')
if links_container:
links = links_container.select('li a')
for link in links:
page_title = link.text.strip()
page_url = WIKI_BASE_URL + link.get('href', '')
# Extract the key name (remove the FR: prefix)
key_match = re.match(r'FR:(.*)', page_title)
if key_match:
key_name = key_match.group(1)
french_pages.append({
'title': page_title,
'key': key_name,
'url': page_url,
'has_translation': False # Will be updated later
})
# Check if there's a next page
next_link = soup.select_one('a.mw-nextlink')
next_page_url = WIKI_BASE_URL + next_link.get('href') if next_link else None
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching French pages: {e}")
break
logger.info(f"Found {len(french_pages)} French pages")
return french_pages
def check_translations(french_pages):
"""
Check if each French page has translations in other languages
Args:
french_pages (list): List of dictionaries containing French page information
Returns:
list: Updated list with translation information
"""
logger.info("Checking for translations of French pages")
for i, page in enumerate(french_pages):
if i % 10 == 0: # Log progress every 10 pages
logger.info(f"Checking page {i+1}/{len(french_pages)}: {page['title']}")
try:
# Construct the English page URL by removing the FR: prefix
en_url = page['url'].replace('/wiki/FR:', '/wiki/')
# Check if the English page exists
response = requests.head(en_url)
# If the page returns a 200 status code, it exists
if response.status_code == 200:
page['has_translation'] = True
page['en_url'] = en_url
else:
page['has_translation'] = False
except requests.exceptions.RequestException as e:
logger.error(f"Error checking translation for {page['title']}: {e}")
# Assume no translation in case of error
page['has_translation'] = False
# Filter to only include pages without translations
untranslated_pages = [page for page in french_pages if not page['has_translation']]
logger.info(f"Found {len(untranslated_pages)} French pages without translations")
return untranslated_pages
def save_untranslated_pages(untranslated_pages):
"""
Save the untranslated pages to a JSON file
Args:
untranslated_pages (list): List of dictionaries containing untranslated page information
Returns:
str: Path to the output file
"""
data = {
'last_updated': datetime.now().isoformat(),
'untranslated_pages': untranslated_pages
}
with open(OUTPUT_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
logger.info(f"Saved {len(untranslated_pages)} untranslated pages to {OUTPUT_FILE}")
return OUTPUT_FILE
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Find French OSM wiki pages without translations")
parser.add_argument("--force", action="store_true", help="Force update even if cache is fresh")
parser.add_argument("--dry-run", action="store_true", help="Print results without saving to file")
args = parser.parse_args()
logger.info("Starting find_untranslated_french_pages.py")
# Check if we should update the cache
if args.force or should_update_cache() or args.dry_run:
# Fetch all French pages
french_pages = fetch_french_pages()
# Check which ones don't have translations
untranslated_pages = check_translations(french_pages)
if args.dry_run:
logger.info(f"Found {len(untranslated_pages)} French pages without translations:")
for page in untranslated_pages[:10]: # Show only the first 10 in dry run
logger.info(f"- {page['title']} ({page['url']})")
if len(untranslated_pages) > 10:
logger.info(f"... and {len(untranslated_pages) - 10} more")
else:
# Save the results
output_file = save_untranslated_pages(untranslated_pages)
logger.info(f"Results saved to {output_file}")
else:
logger.info("Using cached untranslated pages data")
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,242 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
fix_grammar_suggestions.py
This script adds grammar suggestions to the "type" page in the outdated_pages.json file.
It fetches the French content for the page, runs the grammar checker, and updates the file.
"""
import json
import logging
import os
import subprocess
import tempfile
import requests
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTDATED_PAGES_FILE = "outdated_pages.json"
TARGET_KEY = "type"
def load_outdated_pages():
"""
Load the outdated pages from the JSON file
Returns:
dict: Dictionary containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
logger.info(f"Successfully loaded outdated pages from {OUTDATED_PAGES_FILE}")
return data
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading pages from {OUTDATED_PAGES_FILE}: {e}")
return None
def save_outdated_pages(data):
"""
Save the outdated pages to the JSON file
Args:
data (dict): Dictionary containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Successfully saved outdated pages to {OUTDATED_PAGES_FILE}")
except IOError as e:
logger.error(f"Error saving pages to {OUTDATED_PAGES_FILE}: {e}")
def fetch_wiki_page_content(url):
"""
Fetch the content of a wiki page
Args:
url (str): URL of the wiki page
Returns:
str: Content of the wiki page
"""
try:
logger.info(f"Fetching content from {url}")
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Get the main content
content = soup.select_one('#mw-content-text')
if content:
# Remove script and style elements
for script in content.select('script, style'):
script.extract()
# Remove .languages elements
for languages_elem in content.select('.languages'):
languages_elem.extract()
# Get text
text = content.get_text(separator=' ', strip=True)
logger.info(f"Successfully fetched content ({len(text)} characters)")
return text
else:
logger.warning(f"Could not find content in page: {url}")
return ""
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching wiki page content: {e}")
return ""
def check_grammar_with_grammalecte(text):
"""
Check grammar in French text using grammalecte-cli
Args:
text (str): French text to check
Returns:
list: List of grammar suggestions
"""
if not text or len(text.strip()) == 0:
logger.warning("Empty text provided for grammar checking")
return []
logger.info("Checking grammar with grammalecte-cli...")
try:
# Create a temporary file with the text
with tempfile.NamedTemporaryFile(mode='w', encoding='utf-8', suffix='.txt', delete=False) as temp_file:
temp_file.write(text)
temp_file_path = temp_file.name
# Run grammalecte-cli on the temporary file
cmd = ['grammalecte-cli', '-f', temp_file_path, '-j', '-ctx', '-wss']
result = subprocess.run(cmd, capture_output=True, text=True, check=True)
# Parse the JSON output
grammar_data = json.loads(result.stdout)
# Extract grammar errors from all paragraphs
grammar_suggestions = []
for paragraph in grammar_data.get('data', []):
paragraph_index = paragraph.get('iParagraph', 0)
# Process grammar errors
for error in paragraph.get('lGrammarErrors', []):
suggestion = {
'paragraph': paragraph_index,
'start': error.get('nStart', 0),
'end': error.get('nEnd', 0),
'type': error.get('sType', ''),
'message': error.get('sMessage', ''),
'suggestions': error.get('aSuggestions', []),
'text': error.get('sUnderlined', ''),
'before': error.get('sBefore', ''),
'after': error.get('sAfter', '')
}
grammar_suggestions.append(suggestion)
# Process spelling errors
for error in paragraph.get('lSpellingErrors', []):
suggestion = {
'paragraph': paragraph_index,
'start': error.get('nStart', 0),
'end': error.get('nEnd', 0),
'type': 'spelling',
'message': 'Erreur d\'orthographe',
'suggestions': error.get('aSuggestions', []),
'text': error.get('sUnderlined', ''),
'before': error.get('sBefore', ''),
'after': error.get('sAfter', '')
}
grammar_suggestions.append(suggestion)
# Clean up the temporary file
os.unlink(temp_file_path)
logger.info(f"Found {len(grammar_suggestions)} grammar/spelling suggestions")
return grammar_suggestions
except subprocess.CalledProcessError as e:
logger.error(f"Error running grammalecte-cli: {e}")
logger.error(f"stdout: {e.stdout}")
logger.error(f"stderr: {e.stderr}")
return []
except json.JSONDecodeError as e:
logger.error(f"Error parsing grammalecte-cli output: {e}")
return []
except Exception as e:
logger.error(f"Unexpected error during grammar checking: {e}")
return []
def main():
"""Main function to execute the script"""
logger.info("Starting fix_grammar_suggestions.py")
# Load outdated pages
data = load_outdated_pages()
if not data:
logger.error("Failed to load outdated pages")
return
# Find the "type" page in the regular_pages array
type_page = None
for i, page in enumerate(data.get('regular_pages', [])):
if page.get('key') == TARGET_KEY:
type_page = page
type_page_index = i
break
if not type_page:
logger.error(f"Could not find page with key '{TARGET_KEY}'")
return
# Get the French page URL
fr_page = type_page.get('fr_page')
if not fr_page:
logger.error(f"No French page found for key '{TARGET_KEY}'")
return
fr_url = fr_page.get('url')
if not fr_url:
logger.error(f"No URL found for French page of key '{TARGET_KEY}'")
return
# Fetch the content of the French page
content = fetch_wiki_page_content(fr_url)
if not content:
logger.error(f"Could not fetch content from {fr_url}")
return
# Check grammar
logger.info(f"Checking grammar for key '{TARGET_KEY}'")
suggestions = check_grammar_with_grammalecte(content)
if not suggestions:
logger.warning("No grammar suggestions found or grammar checker not available")
# Add the grammar suggestions to the page
type_page['grammar_suggestions'] = suggestions
# Update the page in the data
data['regular_pages'][type_page_index] = type_page
# Save the updated data
save_outdated_pages(data)
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -0,0 +1 @@
sudo apt install aspell aspell-fr grammalecte-cli

View file

@ -0,0 +1,226 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
post_outdated_page.py
This script reads the outdated_pages.json file generated by wiki_compare.py,
randomly selects an outdated French wiki page, and posts a message on Mastodon
suggesting that the page needs updating.
Usage:
python post_outdated_page.py [--dry-run]
Options:
--dry-run Run the script without actually posting to Mastodon
Output:
- A post on Mastodon about an outdated French wiki page
- Log messages about the selected page and posting status
"""
import json
import random
import argparse
import logging
import os
from datetime import datetime
import requests
import re
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Function to read variables from .env file
def read_env_file(env_file_path=".env"):
"""
Read environment variables from a .env file
Args:
env_file_path (str): Path to the .env file
Returns:
dict: Dictionary of environment variables
"""
env_vars = {}
try:
with open(env_file_path, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
# Skip comments and empty lines
if not line or line.startswith('#'):
continue
# Match variable assignments (KEY=VALUE)
match = re.match(r'^([A-Za-z0-9_]+)=(.*)$', line)
if match:
key, value = match.groups()
# Remove quotes if present
value = value.strip('\'"')
env_vars[key] = value
logger.info(f"Successfully loaded environment variables from {env_file_path}")
return env_vars
except IOError as e:
logger.error(f"Error reading .env file {env_file_path}: {e}")
return {}
# Constants
OUTDATED_PAGES_FILE = "outdated_pages.json"
MASTODON_API_URL = "https://mastodon.cipherbliss.com/api/v1/statuses" # Replace with actual instance
# Read MASTODON_ACCESS_TOKEN from .env file
env_vars = read_env_file(".env")
if not env_vars and os.path.exists(os.path.join(os.path.dirname(__file__), ".env")):
# Try with absolute path if relative path fails
env_vars = read_env_file(os.path.join(os.path.dirname(__file__), ".env"))
MASTODON_ACCESS_TOKEN = env_vars.get("MASTODON_ACCESS_TOKEN") or os.environ.get("MASTODON_ACCESS_TOKEN")
def load_outdated_pages():
"""
Load the outdated pages from the JSON file
Returns:
list: List of dictionaries containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
pages = json.load(f)
logger.info(f"Successfully loaded {len(pages)} outdated pages from {OUTDATED_PAGES_FILE}")
return pages
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading outdated pages from {OUTDATED_PAGES_FILE}: {e}")
return []
def select_random_outdated_page(pages):
"""
Randomly select an outdated French page from the list
Args:
pages (list): List of dictionaries containing outdated page information
Returns:
dict: Randomly selected outdated page or None if no suitable pages found
"""
# Filter pages to include only those with a French page (not missing)
pages_with_fr = [page for page in pages if page.get('fr_page') is not None]
if not pages_with_fr:
logger.warning("No outdated French pages found")
return None
# Randomly select a page
selected_page = random.choice(pages_with_fr)
logger.info(f"Randomly selected page for key '{selected_page['key']}'")
return selected_page
def create_mastodon_post(page):
"""
Create a Mastodon post about the outdated wiki page
Args:
page (dict): Dictionary containing outdated page information
Returns:
str: Formatted Mastodon post text
"""
key = page['key']
reason = page['reason']
fr_url = page['fr_page']['url']
en_url = page['en_page']['url']
# Format the post
post = f"""📝 La page wiki OSM pour la clé #{key} a besoin d'une mise à jour !
Raison : {reason}
Vous pouvez aider en mettant à jour la page française :
{fr_url}
Page anglaise de référence :
{en_url}
#OpenStreetMap #OSM #Wiki #Contribution #Traduction"""
return post
def post_to_mastodon(post_text, dry_run=False):
"""
Post the message to Mastodon
Args:
post_text (str): Text to post
dry_run (bool): If True, don't actually post to Mastodon
Returns:
bool: True if posting was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have posted to Mastodon:")
logger.info(post_text)
return True
if not MASTODON_ACCESS_TOKEN:
logger.error("MASTODON_ACCESS_TOKEN not found in .env file or environment variables")
return False
headers = {
"Authorization": f"Bearer {MASTODON_ACCESS_TOKEN}",
"Content-Type": "application/json"
}
data = {
"status": post_text,
"visibility": "public"
}
try:
response = requests.post(MASTODON_API_URL, headers=headers, json=data)
response.raise_for_status()
logger.info("Successfully posted to Mastodon")
return True
except requests.exceptions.RequestException as e:
logger.error(f"Error posting to Mastodon: {e}")
return False
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Post about an outdated OSM wiki page on Mastodon")
parser.add_argument("--dry-run", action="store_true", help="Run without actually posting to Mastodon")
args = parser.parse_args()
logger.info("Starting post_outdated_page.py")
# Load outdated pages
outdated_pages = load_outdated_pages()
if not outdated_pages:
logger.error("No outdated pages found. Run wiki_compare.py first.")
return
# Select a random outdated page
selected_page = select_random_outdated_page(outdated_pages)
if not selected_page:
logger.error("Could not select an outdated page.")
return
# Create the post text
post_text = create_mastodon_post(selected_page)
# Post to Mastodon
success = post_to_mastodon(post_text, args.dry_run)
if success:
logger.info("Script completed successfully")
else:
logger.error("Script completed with errors")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,233 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
propose_translation.py
This script reads the outdated_pages.json file, selects a wiki page (by default the first one),
and uses Ollama with the "mistral:7b" model to propose a translation of the page.
The translation is saved in the "proposed_translation" property of the JSON file.
Usage:
python propose_translation.py [--page KEY]
Options:
--page KEY Specify the key of the page to translate (default: first page in the file)
Output:
- Updated outdated_pages.json file with proposed translations
"""
import json
import argparse
import logging
import requests
import os
import sys
from bs4 import BeautifulSoup
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTDATED_PAGES_FILE = "outdated_pages.json"
OLLAMA_API_URL = "http://localhost:11434/api/generate"
OLLAMA_MODEL = "mistral:7b"
def load_outdated_pages():
"""
Load the outdated pages from the JSON file
Returns:
list: List of dictionaries containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
pages = json.load(f)
logger.info(f"Successfully loaded {len(pages)} pages from {OUTDATED_PAGES_FILE}")
return pages
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading pages from {OUTDATED_PAGES_FILE}: {e}")
return []
def save_to_json(data, filename):
"""
Save data to a JSON file
Args:
data: Data to save
filename (str): Name of the file
"""
try:
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Data saved to {filename}")
except IOError as e:
logger.error(f"Error saving data to {filename}: {e}")
def fetch_wiki_page_content(url):
"""
Fetch the content of a wiki page
Args:
url (str): URL of the wiki page
Returns:
str: Content of the wiki page
"""
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Get the main content
content = soup.select_one('#mw-content-text')
if content:
# Remove script and style elements
for script in content.select('script, style'):
script.extract()
# Remove .languages elements
for languages_elem in content.select('.languages'):
languages_elem.extract()
# Get text
text = content.get_text(separator=' ', strip=True)
return text
else:
logger.warning(f"Could not find content in page: {url}")
return ""
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching wiki page content: {e}")
return ""
def translate_with_ollama(text, model=OLLAMA_MODEL):
"""
Translate text using Ollama
Args:
text (str): Text to translate
model (str): Ollama model to use
Returns:
str: Translated text
"""
prompt = f"""
Tu es un traducteur professionnel spécialisé dans la traduction de documentation technique de l'anglais vers le français.
Traduis le texte suivant de l'anglais vers le français. Conserve le formatage et la structure du texte original.
Ne traduis pas les noms propres, les URLs, et les termes techniques spécifiques à OpenStreetMap.
Texte à traduire:
{text}
"""
try:
logger.info(f"Sending request to Ollama with model {model}")
payload = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(OLLAMA_API_URL, json=payload)
response.raise_for_status()
result = response.json()
translation = result.get('response', '')
logger.info(f"Successfully received translation from Ollama")
return translation
except requests.exceptions.RequestException as e:
logger.error(f"Error translating with Ollama: {e}")
return ""
def select_page_for_translation(pages, key=None):
"""
Select a page for translation
Args:
pages (list): List of dictionaries containing page information
key (str): Key of the page to select (if None, select the first page)
Returns:
dict: Selected page or None if no suitable page found
"""
if not pages:
logger.warning("No pages found that need translation")
return None
if key:
# Find the page with the specified key
for page in pages:
if page.get('key') == key:
logger.info(f"Selected page for key '{key}' for translation")
return page
logger.warning(f"No page found with key '{key}'")
return None
else:
# Select the first page
selected_page = pages[0]
logger.info(f"Selected first page (key '{selected_page['key']}') for translation")
return selected_page
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Propose a translation for an OSM wiki page using Ollama")
parser.add_argument("--page", help="Key of the page to translate (default: first page in the file)")
args = parser.parse_args()
logger.info("Starting propose_translation.py")
# Load pages
pages = load_outdated_pages()
if not pages:
logger.error("No pages found. Run wiki_compare.py first.")
sys.exit(1)
# Select a page for translation
selected_page = select_page_for_translation(pages, args.page)
if not selected_page:
logger.error("Could not select a page for translation.")
sys.exit(1)
# Get the English page URL
en_url = selected_page.get('en_page', {}).get('url')
if not en_url:
logger.error(f"No English page URL found for key '{selected_page['key']}'")
sys.exit(1)
# Fetch the content of the English page
logger.info(f"Fetching content from {en_url}")
content = fetch_wiki_page_content(en_url)
if not content:
logger.error(f"Could not fetch content from {en_url}")
sys.exit(1)
# Translate the content
logger.info(f"Translating content for key '{selected_page['key']}'")
translation = translate_with_ollama(content)
if not translation:
logger.error("Could not translate content")
sys.exit(1)
# Save the translation in the JSON file
logger.info(f"Saving translation for key '{selected_page['key']}'")
selected_page['proposed_translation'] = translation
# Save the updated data back to the file
save_to_json(pages, OUTDATED_PAGES_FILE)
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,381 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
suggest_grammar_improvements.py
This script reads the outdated_pages.json file, selects a wiki page (by default the first one),
and uses grammalecte to check the grammar of the French page content.
The grammar suggestions are saved in the "grammar_suggestions" property of the JSON file.
The script is compatible with different versions of the grammalecte API:
- For newer versions where GrammarChecker is directly in the grammalecte module
- For older versions where GrammarChecker is in the grammalecte.fr module
Usage:
python suggest_grammar_improvements.py [--page KEY]
Options:
--page KEY Specify the key of the page to check (default: first page in the file)
Output:
- Updated outdated_pages.json file with grammar suggestions
"""
import json
import argparse
import logging
import requests
import os
import sys
import subprocess
from bs4 import BeautifulSoup
try:
import grammalecte
import grammalecte.text as txt
# Check if GrammarChecker is available directly in the grammalecte module (newer versions)
try:
from grammalecte import GrammarChecker
GRAMMALECTE_DIRECT_API = True
except ImportError:
# Try the older API structure with fr submodule
try:
import grammalecte.fr as gr_fr
GRAMMALECTE_DIRECT_API = False
except ImportError:
# Neither API is available
raise ImportError("Could not import GrammarChecker from grammalecte")
GRAMMALECTE_AVAILABLE = True
except ImportError:
GRAMMALECTE_AVAILABLE = False
GRAMMALECTE_DIRECT_API = False
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTDATED_PAGES_FILE = "outdated_pages.json"
def load_outdated_pages():
"""
Load the outdated pages from the JSON file
Returns:
list: List of dictionaries containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
pages = json.load(f)
logger.info(f"Successfully loaded {len(pages)} pages from {OUTDATED_PAGES_FILE}")
return pages
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading pages from {OUTDATED_PAGES_FILE}: {e}")
return []
def save_to_json(data, filename):
"""
Save data to a JSON file
Args:
data: Data to save
filename (str): Name of the file
"""
try:
with open(filename, 'w', encoding='utf-8') as f:
json.dump(data, f, indent=2, ensure_ascii=False)
logger.info(f"Data saved to {filename}")
except IOError as e:
logger.error(f"Error saving data to {filename}: {e}")
def fetch_wiki_page_content(url):
"""
Fetch the content of a wiki page
Args:
url (str): URL of the wiki page
Returns:
str: Content of the wiki page
"""
try:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
# Get the main content
content = soup.select_one('#mw-content-text')
if content:
# Remove script and style elements
for script in content.select('script, style'):
script.extract()
# Remove .languages elements
for languages_elem in content.select('.languages'):
languages_elem.extract()
# Get text
text = content.get_text(separator=' ', strip=True)
return text
else:
logger.warning(f"Could not find content in page: {url}")
return ""
except requests.exceptions.RequestException as e:
logger.error(f"Error fetching wiki page content: {e}")
return ""
def check_grammar_with_grammalecte(text):
"""
Check grammar using grammalecte
Args:
text (str): Text to check
Returns:
list: List of grammar suggestions
"""
if not GRAMMALECTE_AVAILABLE:
logger.error("Grammalecte is not installed. Please install it with: pip install grammalecte")
return []
try:
logger.info("Checking grammar with grammalecte")
# Initialize grammalecte based on which API version is available
if GRAMMALECTE_DIRECT_API:
# New API: GrammarChecker is directly in grammalecte module
logger.info("Using direct GrammarChecker API")
gce = GrammarChecker("fr")
# Split text into paragraphs
paragraphs = txt.getParagraph(text)
# Check grammar for each paragraph
suggestions = []
for i, paragraph in enumerate(paragraphs):
if paragraph.strip():
# Use getParagraphErrors method
errors = gce.getParagraphErrors(paragraph)
for error in errors:
# Filter out spelling errors if needed
if "sType" in error and error["sType"] != "WORD" and error.get("bError", True):
suggestion = {
"paragraph": i + 1,
"start": error.get("nStart", 0),
"end": error.get("nEnd", 0),
"type": error.get("sType", ""),
"message": error.get("sMessage", ""),
"suggestions": error.get("aSuggestions", []),
"context": paragraph[max(0, error.get("nStart", 0) - 20):min(len(paragraph), error.get("nEnd", 0) + 20)]
}
suggestions.append(suggestion)
else:
# Old API: GrammarChecker is in grammalecte.fr module
logger.info("Using legacy grammalecte.fr.GrammarChecker API")
gce = gr_fr.GrammarChecker("fr")
# Split text into paragraphs
paragraphs = txt.getParagraph(text)
# Check grammar for each paragraph
suggestions = []
for i, paragraph in enumerate(paragraphs):
if paragraph.strip():
# Use parse method for older API
for error in gce.parse(paragraph, "FR", False):
if error["sType"] != "WORD" and error["bError"]:
suggestion = {
"paragraph": i + 1,
"start": error["nStart"],
"end": error["nEnd"],
"type": error["sType"],
"message": error["sMessage"],
"suggestions": error.get("aSuggestions", []),
"context": paragraph[max(0, error["nStart"] - 20):min(len(paragraph), error["nEnd"] + 20)]
}
suggestions.append(suggestion)
logger.info(f"Found {len(suggestions)} grammar suggestions")
return suggestions
except Exception as e:
logger.error(f"Error checking grammar with grammalecte: {e}")
return []
def check_grammar_with_cli(text):
"""
Check grammar using grammalecte-cli command
Args:
text (str): Text to check
Returns:
list: List of grammar suggestions
"""
try:
logger.info("Checking grammar with grammalecte-cli")
# Create a temporary file with the text
temp_file = "temp_text_for_grammar_check.txt"
with open(temp_file, 'w', encoding='utf-8') as f:
f.write(text)
# Run grammalecte-cli
cmd = ["grammalecte-cli", "--json", "--file", temp_file]
result = subprocess.run(cmd, capture_output=True, text=True, encoding='utf-8')
# Remove temporary file
if os.path.exists(temp_file):
os.remove(temp_file)
if result.returncode != 0:
logger.error(f"Error running grammalecte-cli: {result.stderr}")
return []
# Parse JSON output
output = json.loads(result.stdout)
# Extract grammar suggestions
suggestions = []
for paragraph_data in output.get("data", []):
paragraph_index = paragraph_data.get("iParagraph", 0)
for error in paragraph_data.get("lGrammarErrors", []):
suggestion = {
"paragraph": paragraph_index + 1,
"start": error.get("nStart", 0),
"end": error.get("nEnd", 0),
"type": error.get("sType", ""),
"message": error.get("sMessage", ""),
"suggestions": error.get("aSuggestions", []),
"context": error.get("sContext", "")
}
suggestions.append(suggestion)
logger.info(f"Found {len(suggestions)} grammar suggestions")
return suggestions
except Exception as e:
logger.error(f"Error checking grammar with grammalecte-cli: {e}")
return []
def check_grammar(text):
"""
Check grammar using available method (Python library or CLI)
Args:
text (str): Text to check
Returns:
list: List of grammar suggestions
"""
# Try using the Python library first
if GRAMMALECTE_AVAILABLE:
return check_grammar_with_grammalecte(text)
# Fall back to CLI if available
try:
# Check if grammalecte-cli is available
subprocess.run(["grammalecte-cli", "--help"], capture_output=True)
return check_grammar_with_cli(text)
except (subprocess.SubprocessError, FileNotFoundError):
logger.error("Neither grammalecte Python package nor grammalecte-cli is available.")
logger.error("Please install grammalecte with: pip install grammalecte")
return []
def select_page_for_grammar_check(pages, key=None):
"""
Select a page for grammar checking
Args:
pages (list): List of dictionaries containing page information
key (str): Key of the page to select (if None, select the first page)
Returns:
dict: Selected page or None if no suitable page found
"""
if not pages:
logger.warning("No pages found that need grammar checking")
return None
if key:
# Find the page with the specified key
for page in pages:
if page.get('key') == key:
# Check if the page has a French version
if page.get('fr_page') is None:
logger.warning(f"Page with key '{key}' does not have a French version")
return None
logger.info(f"Selected page for key '{key}' for grammar checking")
return page
logger.warning(f"No page found with key '{key}'")
return None
else:
# Select the first page that has a French version
for page in pages:
if page.get('fr_page') is not None:
logger.info(f"Selected first page with French version (key '{page['key']}') for grammar checking")
return page
logger.warning("No pages found with French versions")
return None
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Suggest grammar improvements for an OSM wiki page using grammalecte")
parser.add_argument("--page", help="Key of the page to check (default: first page with a French version)")
args = parser.parse_args()
logger.info("Starting suggest_grammar_improvements.py")
# Load pages
pages = load_outdated_pages()
if not pages:
logger.error("No pages found. Run wiki_compare.py first.")
sys.exit(1)
# Select a page for grammar checking
selected_page = select_page_for_grammar_check(pages, args.page)
if not selected_page:
logger.error("Could not select a page for grammar checking.")
sys.exit(1)
# Get the French page URL
fr_url = selected_page.get('fr_page', {}).get('url')
if not fr_url:
logger.error(f"No French page URL found for key '{selected_page['key']}'")
sys.exit(1)
# Fetch the content of the French page
logger.info(f"Fetching content from {fr_url}")
content = fetch_wiki_page_content(fr_url)
if not content:
logger.error(f"Could not fetch content from {fr_url}")
sys.exit(1)
# Check grammar
logger.info(f"Checking grammar for key '{selected_page['key']}'")
suggestions = check_grammar(content)
if not suggestions:
logger.warning("No grammar suggestions found or grammar checker not available")
# Save the grammar suggestions in the JSON file
logger.info(f"Saving grammar suggestions for key '{selected_page['key']}'")
selected_page['grammar_suggestions'] = suggestions
# Save the updated data back to the file
save_to_json(pages, OUTDATED_PAGES_FILE)
logger.info("Script completed successfully")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,212 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
suggest_translation.py
This script reads the outdated_pages.json file generated by wiki_compare.py,
identifies English wiki pages that don't have a French translation,
and posts a message on Mastodon suggesting that the page needs translation.
Usage:
python suggest_translation.py [--dry-run]
Options:
--dry-run Run the script without actually posting to Mastodon
Output:
- A post on Mastodon suggesting a wiki page for translation
- Log messages about the selected page and posting status
"""
import json
import random
import argparse
import logging
import os
from datetime import datetime
import requests
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)
# Constants
OUTDATED_PAGES_FILE = "outdated_pages.json"
MASTODON_API_URL = "https://mastodon.instance/api/v1/statuses" # Replace with actual instance
MASTODON_ACCESS_TOKEN = os.environ.get("MASTODON_ACCESS_TOKEN")
def load_outdated_pages():
"""
Load the outdated pages from the JSON file
Returns:
list: List of dictionaries containing outdated page information
"""
try:
with open(OUTDATED_PAGES_FILE, 'r', encoding='utf-8') as f:
pages = json.load(f)
logger.info(f"Successfully loaded {len(pages)} pages from {OUTDATED_PAGES_FILE}")
return pages
except (IOError, json.JSONDecodeError) as e:
logger.error(f"Error loading pages from {OUTDATED_PAGES_FILE}: {e}")
return []
def find_missing_translations(pages):
"""
Find English pages that don't have a French translation
Args:
pages (list): List of dictionaries containing page information
Returns:
list: List of pages that need translation
"""
# Filter pages to include only those with a missing French page
missing_translations = [page for page in pages if
page.get('reason') == 'French page missing' and
page.get('en_page') is not None and
page.get('fr_page') is None]
logger.info(f"Found {len(missing_translations)} pages without French translation")
return missing_translations
def select_random_page_for_translation(pages):
"""
Randomly select a page for translation from the list
Args:
pages (list): List of dictionaries containing page information
Returns:
dict: Randomly selected page or None if no suitable pages found
"""
if not pages:
logger.warning("No pages found that need translation")
return None
# Randomly select a page
selected_page = random.choice(pages)
logger.info(f"Randomly selected page for key '{selected_page['key']}' for translation")
return selected_page
def create_mastodon_post(page):
"""
Create a Mastodon post suggesting a page for translation
Args:
page (dict): Dictionary containing page information
Returns:
str: Formatted Mastodon post text
"""
key = page['key']
en_url = page['en_page']['url']
fr_url = en_url.replace('/wiki/Key:', '/wiki/FR:Key:')
# Get word count and sections from English page
word_count = page['en_page']['word_count']
sections = page['en_page']['sections']
# Format the post
post = f"""🔍 Clé OSM sans traduction française : #{key}
Cette page wiki importante n'a pas encore de traduction française !
📊 Statistiques de la page anglaise :
{word_count} mots
{sections} sections
Vous pouvez aider en créant la traduction française ici :
{fr_url}
Page anglaise à traduire :
{en_url}
#OpenStreetMap #OSM #Wiki #Traduction #Contribution"""
return post
def post_to_mastodon(post_text, dry_run=False):
"""
Post the message to Mastodon
Args:
post_text (str): Text to post
dry_run (bool): If True, don't actually post to Mastodon
Returns:
bool: True if posting was successful or dry run, False otherwise
"""
if dry_run:
logger.info("DRY RUN: Would have posted to Mastodon:")
logger.info(post_text)
return True
if not MASTODON_ACCESS_TOKEN:
logger.error("MASTODON_ACCESS_TOKEN environment variable not set")
return False
headers = {
"Authorization": f"Bearer {MASTODON_ACCESS_TOKEN}",
"Content-Type": "application/json"
}
data = {
"status": post_text,
"visibility": "public"
}
try:
response = requests.post(MASTODON_API_URL, headers=headers, json=data)
response.raise_for_status()
logger.info("Successfully posted to Mastodon")
return True
except requests.exceptions.RequestException as e:
logger.error(f"Error posting to Mastodon: {e}")
return False
def main():
"""Main function to execute the script"""
parser = argparse.ArgumentParser(description="Suggest an OSM wiki page for translation on Mastodon")
parser.add_argument("--dry-run", action="store_true", help="Run without actually posting to Mastodon")
args = parser.parse_args()
logger.info("Starting suggest_translation.py")
# Load pages
pages = load_outdated_pages()
if not pages:
logger.error("No pages found. Run wiki_compare.py first.")
return
# Find pages that need translation
pages_for_translation = find_missing_translations(pages)
if not pages_for_translation:
logger.error("No pages found that need translation.")
return
# Select a random page for translation
selected_page = select_random_page_for_translation(pages_for_translation)
if not selected_page:
logger.error("Could not select a page for translation.")
return
# Create the post text
post_text = create_mastodon_post(selected_page)
# Post to Mastodon
success = post_to_mastodon(post_text, args.dry_run)
if success:
logger.info("Script completed successfully")
else:
logger.error("Script completed with errors")
if __name__ == "__main__":
main()

70
wiki_compare/test_json.py Normal file
View file

@ -0,0 +1,70 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
test_json.py
This script tests writing a JSON file with some test data.
"""
import json
import os
from datetime import datetime
# Test data
test_data = {
"last_updated": datetime.now().isoformat(),
"recent_changes": [
{
"page_name": "Test Page 1",
"page_url": "https://example.com/test1",
"timestamp": "12:34",
"user": "Test User 1",
"comment": "Test comment 1",
"change_size": "+123"
},
{
"page_name": "Test Page 2",
"page_url": "https://example.com/test2",
"timestamp": "23:45",
"user": "Test User 2",
"comment": "Test comment 2",
"change_size": "-456"
}
]
}
# Output file
output_file = "test_recent_changes.json"
# Write the data to the file
print(f"Writing test data to {output_file}")
with open(output_file, 'w', encoding='utf-8') as f:
json.dump(test_data, f, indent=2, ensure_ascii=False)
# Check if the file was created
if os.path.exists(output_file):
file_size = os.path.getsize(output_file)
print(f"File {output_file} created, size: {file_size} bytes")
# Read the content of the file to verify
with open(output_file, 'r', encoding='utf-8') as f:
file_content = f.read()
print(f"File content: {file_content}")
else:
print(f"Failed to create file {output_file}")
# Copy the file to the public directory
public_file = os.path.join(os.path.dirname(os.path.dirname(output_file)), 'public', os.path.basename(output_file))
print(f"Copying {output_file} to {public_file}")
import shutil
shutil.copy2(output_file, public_file)
# Check if the public file was created
if os.path.exists(public_file):
public_size = os.path.getsize(public_file)
print(f"Public file {public_file} created, size: {public_size} bytes")
else:
print(f"Failed to create public file {public_file}")
print("Script completed successfully")

1158
wiki_compare/wiki_compare.py Executable file

File diff suppressed because it is too large Load diff

107
wiki_compare/wiki_pages.csv Normal file
View file

@ -0,0 +1,107 @@
key,language,url,last_modified,sections,word_count,link_count,media_count,staleness_score,description_img_url
building,en,https://wiki.openstreetmap.org/wiki/Key:building,2025-06-10,31,3774,627,158,8.91,https://wiki.openstreetmap.org/w/images/thumb/6/61/Emptyhouse.jpg/200px-Emptyhouse.jpg
building,fr,https://wiki.openstreetmap.org/wiki/FR:Key:building,2025-05-22,25,3181,544,155,8.91,https://wiki.openstreetmap.org/w/images/thumb/6/61/Emptyhouse.jpg/200px-Emptyhouse.jpg
source,en,https://wiki.openstreetmap.org/wiki/Key:source,2025-08-12,27,2752,314,42,113.06,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
source,fr,https://wiki.openstreetmap.org/wiki/FR:Key:source,2024-02-07,23,2593,230,35,113.06,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
highway,en,https://wiki.openstreetmap.org/wiki/Key:highway,2025-04-10,30,4126,780,314,20.35,https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Roads_in_Switzerland_%2827965437018%29.jpg/200px-Roads_in_Switzerland_%2827965437018%29.jpg
highway,fr,https://wiki.openstreetmap.org/wiki/FR:Key:highway,2025-01-05,30,4141,695,313,20.35,https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Roads_in_Switzerland_%2827965437018%29.jpg/200px-Roads_in_Switzerland_%2827965437018%29.jpg
addr:housenumber,en,https://wiki.openstreetmap.org/wiki/Key:addr:housenumber,2025-07-24,11,330,97,20,14.01,https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Ferry_Street%2C_Portaferry_%2809%29%2C_October_2009.JPG/200px-Ferry_Street%2C_Portaferry_%2809%29%2C_October_2009.JPG
addr:housenumber,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:housenumber,2025-08-23,15,1653,150,77,14.01,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
addr:street,en,https://wiki.openstreetmap.org/wiki/Key:addr:street,2024-10-29,12,602,101,16,66.04,https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/UK_-_London_%2830474933636%29.jpg/200px-UK_-_London_%2830474933636%29.jpg
addr:street,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:street,2025-08-23,15,1653,150,77,66.04,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
addr:city,en,https://wiki.openstreetmap.org/wiki/Key:addr:city,2025-07-29,15,802,105,17,9.93,https://upload.wikimedia.org/wikipedia/commons/thumb/1/18/Lillerod.jpg/200px-Lillerod.jpg
addr:city,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:city,2025-08-23,15,1653,150,77,9.93,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
name,en,https://wiki.openstreetmap.org/wiki/Key:name,2025-07-25,17,2196,281,82,42.39,https://upload.wikimedia.org/wikipedia/commons/thumb/6/61/Helena%2C_Montana.jpg/200px-Helena%2C_Montana.jpg
name,fr,https://wiki.openstreetmap.org/wiki/FR:Key:name,2025-01-16,21,1720,187,60,42.39,https://wiki.openstreetmap.org/w/images/3/37/Strakers.jpg
addr:postcode,en,https://wiki.openstreetmap.org/wiki/Key:addr:postcode,2024-10-29,14,382,83,11,67.11,https://upload.wikimedia.org/wikipedia/commons/thumb/0/04/Farrer_post_code.jpg/200px-Farrer_post_code.jpg
addr:postcode,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:postcode,2025-08-23,15,1653,150,77,67.11,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
natural,en,https://wiki.openstreetmap.org/wiki/Key:natural,2025-07-17,17,2070,535,189,22.06,https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/VocaDi-Nature%2CGeneral.jpeg/200px-VocaDi-Nature%2CGeneral.jpeg
natural,fr,https://wiki.openstreetmap.org/wiki/FR:Key:natural,2025-04-21,13,1499,455,174,22.06,https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/VocaDi-Nature%2CGeneral.jpeg/200px-VocaDi-Nature%2CGeneral.jpeg
surface,en,https://wiki.openstreetmap.org/wiki/Key:surface,2025-08-28,24,3475,591,238,264.64,https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Transportation_in_Tanzania_Traffic_problems.JPG/200px-Transportation_in_Tanzania_Traffic_problems.JPG
surface,fr,https://wiki.openstreetmap.org/wiki/FR:Key:surface,2022-02-22,13,2587,461,232,264.64,https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Transportation_in_Tanzania_Traffic_problems.JPG/200px-Transportation_in_Tanzania_Traffic_problems.JPG
addr:country,en,https://wiki.openstreetmap.org/wiki/Key:addr:country,2024-12-01,9,184,65,11,22.96,https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Europe_ISO_3166-1.svg/200px-Europe_ISO_3166-1.svg.png
addr:country,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:country,2025-03-25,8,187,65,11,22.96,https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Europe_ISO_3166-1.svg/200px-Europe_ISO_3166-1.svg.png
landuse,en,https://wiki.openstreetmap.org/wiki/Key:landuse,2025-03-01,17,2071,446,168,39.41,https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Changing_landuse_-_geograph.org.uk_-_1137810.jpg/200px-Changing_landuse_-_geograph.org.uk_-_1137810.jpg
landuse,fr,https://wiki.openstreetmap.org/wiki/FR:Key:landuse,2024-08-20,19,2053,418,182,39.41,https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Changing_landuse_-_geograph.org.uk_-_1137810.jpg/200px-Changing_landuse_-_geograph.org.uk_-_1137810.jpg
power,en,https://wiki.openstreetmap.org/wiki/Key:power,2025-02-28,20,641,127,21,124.89,https://wiki.openstreetmap.org/w/images/thumb/0/01/Power-tower.JPG/200px-Power-tower.JPG
power,fr,https://wiki.openstreetmap.org/wiki/FR:Key:power,2023-06-27,14,390,105,25,124.89,https://wiki.openstreetmap.org/w/images/thumb/0/01/Power-tower.JPG/200px-Power-tower.JPG
waterway,en,https://wiki.openstreetmap.org/wiki/Key:waterway,2025-03-10,21,1830,365,118,77.94,https://wiki.openstreetmap.org/w/images/thumb/f/fe/450px-Marshall-county-indiana-yellow-river.jpg/200px-450px-Marshall-county-indiana-yellow-river.jpg
waterway,fr,https://wiki.openstreetmap.org/wiki/FR:Key:waterway,2024-03-08,18,1291,272,113,77.94,https://wiki.openstreetmap.org/w/images/thumb/f/fe/450px-Marshall-county-indiana-yellow-river.jpg/200px-450px-Marshall-county-indiana-yellow-river.jpg
building:levels,en,https://wiki.openstreetmap.org/wiki/Key:building:levels,2025-08-13,16,1351,204,25,76.11,https://wiki.openstreetmap.org/w/images/thumb/4/47/Building-levels.png/200px-Building-levels.png
building:levels,fr,https://wiki.openstreetmap.org/wiki/FR:Key:building:levels,2024-08-01,15,1457,202,26,76.11,https://wiki.openstreetmap.org/w/images/thumb/4/47/Building-levels.png/200px-Building-levels.png
amenity,en,https://wiki.openstreetmap.org/wiki/Key:amenity,2025-08-24,29,3066,915,504,160.78,https://wiki.openstreetmap.org/w/images/thumb/a/a5/Mapping-Features-Parking-Lot.png/200px-Mapping-Features-Parking-Lot.png
amenity,fr,https://wiki.openstreetmap.org/wiki/FR:Key:amenity,2023-07-19,22,2146,800,487,160.78,https://wiki.openstreetmap.org/w/images/thumb/a/a5/Mapping-Features-Parking-Lot.png/200px-Mapping-Features-Parking-Lot.png
barrier,en,https://wiki.openstreetmap.org/wiki/Key:barrier,2025-04-15,17,2137,443,173,207.98,https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg/200px-2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg
barrier,fr,https://wiki.openstreetmap.org/wiki/FR:Key:barrier,2022-08-16,15,542,103,18,207.98,https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg/200px-2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg
source:date,en,https://wiki.openstreetmap.org/wiki/Key:source:date,2023-04-01,11,395,75,10,22.47,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
source:date,fr,https://wiki.openstreetmap.org/wiki/FR:Key:source:date,2023-07-21,10,419,75,11,22.47,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
service,en,https://wiki.openstreetmap.org/wiki/Key:service,2025-03-16,22,1436,218,17,83.79,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
service,fr,https://wiki.openstreetmap.org/wiki/FR:Key:service,2024-03-04,11,443,100,10,83.79,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
addr:state,en,https://wiki.openstreetmap.org/wiki/Key:addr:state,2023-06-23,12,289,74,11,100,https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/WVaCent.jpg/200px-WVaCent.jpg
access,en,https://wiki.openstreetmap.org/wiki/Key:access,2025-08-06,31,5803,708,98,66.75,https://wiki.openstreetmap.org/w/images/5/5e/WhichAccess.png
access,fr,https://wiki.openstreetmap.org/wiki/FR:Key:access,2024-11-27,33,3200,506,83,66.75,https://wiki.openstreetmap.org/w/images/5/5e/WhichAccess.png
oneway,en,https://wiki.openstreetmap.org/wiki/Key:oneway,2025-07-17,28,2318,290,30,19.4,https://upload.wikimedia.org/wikipedia/commons/thumb/1/13/One_way_sign.JPG/200px-One_way_sign.JPG
oneway,fr,https://wiki.openstreetmap.org/wiki/FR:Key:oneway,2025-06-16,14,645,108,14,19.4,https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/France_road_sign_C12.svg/200px-France_road_sign_C12.svg.png
height,en,https://wiki.openstreetmap.org/wiki/Key:height,2025-07-21,24,1184,184,20,8.45,https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Height_demonstration_diagram.png/200px-Height_demonstration_diagram.png
height,fr,https://wiki.openstreetmap.org/wiki/FR:Key:height,2025-06-14,21,1285,190,21,8.45,https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Height_demonstration_diagram.png/200px-Height_demonstration_diagram.png
ref,en,https://wiki.openstreetmap.org/wiki/Key:ref,2025-07-25,26,4404,782,115,11.79,https://upload.wikimedia.org/wikipedia/commons/thumb/3/3d/UK_traffic_sign_2901.svg/200px-UK_traffic_sign_2901.svg.png
ref,fr,https://wiki.openstreetmap.org/wiki/FR:Key:ref,2025-07-30,20,3393,460,12,11.79,https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Autoroute_fran%C3%A7aise_1.svg/200px-Autoroute_fran%C3%A7aise_1.svg.png
maxspeed,en,https://wiki.openstreetmap.org/wiki/Key:maxspeed,2025-08-20,30,4275,404,38,39.24,https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg/200px-Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg.png
maxspeed,fr,https://wiki.openstreetmap.org/wiki/FR:Key:maxspeed,2025-05-10,25,1401,156,23,39.24,https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg/200px-Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg.png
lanes,en,https://wiki.openstreetmap.org/wiki/Key:lanes,2025-08-21,26,2869,355,48,117.16,https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/A55_trunk_road_looking_east_-_geograph.org.uk_-_932668.jpg/200px-A55_trunk_road_looking_east_-_geograph.org.uk_-_932668.jpg
lanes,fr,https://wiki.openstreetmap.org/wiki/FR:Key:lanes,2024-03-07,19,1492,167,19,117.16,https://wiki.openstreetmap.org/w/images/thumb/d/d4/Dscf0444_600.jpg/200px-Dscf0444_600.jpg
start_date,en,https://wiki.openstreetmap.org/wiki/Key:start_date,2025-08-01,22,1098,168,29,214.58,https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Connel_bridge_plate.jpg/200px-Connel_bridge_plate.jpg
start_date,fr,https://wiki.openstreetmap.org/wiki/FR:Key:start_date,2022-08-29,19,1097,133,22,214.58,https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Connel_bridge_plate.jpg/200px-Connel_bridge_plate.jpg
addr:district,en,https://wiki.openstreetmap.org/wiki/Key:addr:district,2023-11-06,11,244,76,11,139.96,https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Hangal_Taluk.jpg/200px-Hangal_Taluk.jpg
addr:district,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:district,2025-08-23,15,1653,150,77,139.96,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
layer,en,https://wiki.openstreetmap.org/wiki/Key:layer,2025-01-02,16,1967,181,17,65.95,https://wiki.openstreetmap.org/w/images/thumb/2/26/Washington_layers.png/200px-Washington_layers.png
layer,fr,https://wiki.openstreetmap.org/wiki/FR:Key:layer,2024-02-16,15,2231,162,17,65.95,https://wiki.openstreetmap.org/w/images/thumb/2/26/Washington_layers.png/200px-Washington_layers.png
type,en,https://wiki.openstreetmap.org/wiki/Key:type,2025-05-13,20,911,200,72,334.06,https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
type,fr,https://wiki.openstreetmap.org/wiki/FR:Key:type,2020-11-13,10,444,78,10,334.06,https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
operator,en,https://wiki.openstreetmap.org/wiki/Key:operator,2025-08-26,24,1908,241,37,223.28,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
operator,fr,https://wiki.openstreetmap.org/wiki/FR:Key:operator,2022-09-30,15,418,89,11,223.28,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
lit,en,https://wiki.openstreetmap.org/wiki/Key:lit,2024-07-20,17,931,174,52,38.88,https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Peatonal_Bicentenario.JPG/200px-Peatonal_Bicentenario.JPG
lit,fr,https://wiki.openstreetmap.org/wiki/FR:Key:lit,2025-01-19,17,628,123,14,38.88,https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/2014_K%C5%82odzko%2C_ul._Grottgera_14.JPG/200px-2014_K%C5%82odzko%2C_ul._Grottgera_14.JPG
wall,en,https://wiki.openstreetmap.org/wiki/Key:wall,2024-05-02,14,682,206,61,100,https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
tiger:cfcc,en,https://wiki.openstreetmap.org/wiki/Key:tiger:cfcc,2022-12-09,10,127,24,7,100,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
crossing,en,https://wiki.openstreetmap.org/wiki/Key:crossing,2024-02-18,25,2678,363,34,76.98,https://wiki.openstreetmap.org/w/images/thumb/7/75/Toucan.jpg/200px-Toucan.jpg
crossing,fr,https://wiki.openstreetmap.org/wiki/FR:Key:crossing,2025-01-20,15,1390,254,28,76.98,https://wiki.openstreetmap.org/w/images/thumb/7/75/Toucan.jpg/200px-Toucan.jpg
tiger:county,en,https://wiki.openstreetmap.org/wiki/Key:tiger:county,2022-12-09,10,127,24,7,100,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
source:addr,en,https://wiki.openstreetmap.org/wiki/Key:source:addr,2023-07-05,9,200,70,10,100,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
footway,en,https://wiki.openstreetmap.org/wiki/Key:footway,2025-08-20,23,2002,369,39,99.66,https://wiki.openstreetmap.org/w/images/thumb/b/b9/Sidewalk_and_zebra-crossing.jpg/200px-Sidewalk_and_zebra-crossing.jpg
footway,fr,https://wiki.openstreetmap.org/wiki/FR:Key:footway,2024-06-04,14,685,147,28,99.66,https://wiki.openstreetmap.org/w/images/thumb/b/b9/Sidewalk_and_zebra-crossing.jpg/200px-Sidewalk_and_zebra-crossing.jpg
ref:bag,en,https://wiki.openstreetmap.org/wiki/Key:ref:bag,2024-10-09,10,254,69,11,100,https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
addr:place,en,https://wiki.openstreetmap.org/wiki/Key:addr:place,2025-03-28,16,1204,154,13,136.57,https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Suburb_of_Phillip.jpg/200px-Suburb_of_Phillip.jpg
addr:place,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:place,2023-06-17,11,276,75,12,136.57,https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Suburb_of_Phillip.jpg/200px-Suburb_of_Phillip.jpg
tiger:reviewed,en,https://wiki.openstreetmap.org/wiki/Key:tiger:reviewed,2025-08-01,16,734,105,11,100,https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/US-Census-TIGERLogo.svg/200px-US-Census-TIGERLogo.svg.png
leisure,en,https://wiki.openstreetmap.org/wiki/Key:leisure,2025-02-28,12,1084,374,180,232.43,https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Hammock_-_Polynesia.jpg/200px-Hammock_-_Polynesia.jpg
leisure,fr,https://wiki.openstreetmap.org/wiki/FR:Key:leisure,2021-12-29,11,951,360,186,232.43,https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Hammock_-_Polynesia.jpg/200px-Hammock_-_Polynesia.jpg
addr:suburb,en,https://wiki.openstreetmap.org/wiki/Key:addr:suburb,2024-02-24,14,439,89,11,1.49,https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Grosvenor_Place_2_2008_06_19.jpg/200px-Grosvenor_Place_2_2008_06_19.jpg
addr:suburb,fr,https://wiki.openstreetmap.org/wiki/FR:Key:addr:suburb,2024-02-18,13,418,87,11,1.49,https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Grosvenor_Place_2_2008_06_19.jpg/200px-Grosvenor_Place_2_2008_06_19.jpg
ele,en,https://wiki.openstreetmap.org/wiki/Key:ele,2025-07-18,18,1846,165,24,104.45,https://wiki.openstreetmap.org/w/images/a/a3/Key-ele_mapnik.png
ele,fr,https://wiki.openstreetmap.org/wiki/FR:Key:ele,2024-03-02,15,1277,128,13,104.45,https://wiki.openstreetmap.org/w/images/a/a3/Key-ele_mapnik.png
tracktype,en,https://wiki.openstreetmap.org/wiki/Key:tracktype,2024-12-02,16,652,146,35,32.71,https://wiki.openstreetmap.org/w/images/thumb/1/13/Tracktype-collage.jpg/200px-Tracktype-collage.jpg
tracktype,fr,https://wiki.openstreetmap.org/wiki/FR:Key:tracktype,2025-05-03,11,463,105,29,32.71,https://wiki.openstreetmap.org/w/images/thumb/1/13/Tracktype-collage.jpg/200px-Tracktype-collage.jpg
addr:neighbourhood,en,https://wiki.openstreetmap.org/wiki/Key:addr:neighbourhood,2025-04-29,24,2020,235,83,100,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
addr:hamlet,en,https://wiki.openstreetmap.org/wiki/Key:addr:hamlet,2024-12-05,9,142,64,11,100,https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Grosvenor_Place_2_2008_06_19.jpg/200px-Grosvenor_Place_2_2008_06_19.jpg
addr:province,en,https://wiki.openstreetmap.org/wiki/Key:addr:province,2022-05-04,9,156,64,11,100,https://upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Stamp_of_Indonesia_-_2002_-_Colnect_265917_-_Aceh_Province.jpeg/200px-Stamp_of_Indonesia_-_2002_-_Colnect_265917_-_Aceh_Province.jpeg
leaf_type,en,https://wiki.openstreetmap.org/wiki/Key:leaf_type,2025-01-22,15,739,201,57,114.46,https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/Picea_abies_Nadelkissen.jpg/200px-Picea_abies_Nadelkissen.jpg
leaf_type,fr,https://wiki.openstreetmap.org/wiki/FR:Key:leaf_type,2023-07-02,14,734,220,64,114.46,https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/Picea_abies_Nadelkissen.jpg/200px-Picea_abies_Nadelkissen.jpg
addr:full,en,https://wiki.openstreetmap.org/wiki/Key:addr:full,2025-04-29,24,2020,235,83,100,https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
Anatomie_des_étiquettes_osm,en,https://wiki.openstreetmap.org/wiki/Anatomie_des_étiquettes_osm,2025-06-08,22,963,53,0,100,
Tag:leisure=children_club,en,https://wiki.openstreetmap.org/wiki/Tag:leisure=children_club,2025-02-02,9,163,69,9,56.04,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
Tag:leisure=children_club,fr,https://wiki.openstreetmap.org/wiki/FR:Tag:leisure=children_club,2024-05-02,8,294,67,10,56.04,https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Dave_%26_Buster%27s_video_arcade_in_Columbus%2C_OH_-_17910.JPG/200px-Dave_%26_Buster%27s_video_arcade_in_Columbus%2C_OH_-_17910.JPG
Tag:harassment_prevention=ask_angela,en,https://wiki.openstreetmap.org/wiki/Tag:harassment_prevention=ask_angela,2025-02-22,14,463,72,9,42.56,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
Tag:harassment_prevention=ask_angela,fr,https://wiki.openstreetmap.org/wiki/FR:Tag:harassment_prevention=ask_angela,2025-09-01,20,873,166,15,42.56,https://wiki.openstreetmap.org/w/images/thumb/1/15/2024-06-27T08.40.50_ask_angela_lyon.jpg/200px-2024-06-27T08.40.50_ask_angela_lyon.jpg
Key:harassment_prevention,en,https://wiki.openstreetmap.org/wiki/Key:harassment_prevention,2024-08-10,12,196,69,14,66.72,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
Key:harassment_prevention,fr,https://wiki.openstreetmap.org/wiki/FR:Key:harassment_prevention,2025-07-03,15,328,83,14,66.72,https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
Proposal process,en,https://wiki.openstreetmap.org/wiki/Proposal process,2025-08-13,46,5292,202,4,166.25,https://wiki.openstreetmap.org/w/images/thumb/c/c2/Save_proposal_first.png/761px-Save_proposal_first.png
Proposal process,fr,https://wiki.openstreetmap.org/wiki/FR:Proposal process,2023-09-22,15,1146,24,0,166.25,
Automated_Edits_code_of_conduct,en,https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct,2025-07-26,19,2062,69,0,26.35,
Automated_Edits_code_of_conduct,fr,https://wiki.openstreetmap.org/wiki/FR:Automated_Edits_code_of_conduct,2025-04-03,17,1571,16,0,26.35,
Key:cuisine,en,https://wiki.openstreetmap.org/wiki/Key:cuisine,2025-07-23,17,3422,693,303,107.73,https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Food_montage.jpg/200px-Food_montage.jpg
Key:cuisine,fr,https://wiki.openstreetmap.org/wiki/FR:Key:cuisine,2024-02-16,15,2866,690,316,107.73,https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Food_montage.jpg/200px-Food_montage.jpg
Libre_Charge_Map,en,https://wiki.openstreetmap.org/wiki/Libre_Charge_Map,2025-07-28,11,328,10,2,100,https://wiki.openstreetmap.org/w/images/thumb/8/8e/Screenshot_2025-07-28_at_14-40-11_LibreChargeMap_-_OSM_Bliss.png/300px-Screenshot_2025-07-28_at_14-40-11_LibreChargeMap_-_OSM_Bliss.png
OSM_Mon_Commerce,en,https://wiki.openstreetmap.org/wiki/OSM_Mon_Commerce,2025-07-29,17,418,34,3,100,https://wiki.openstreetmap.org/w/images/thumb/6/67/Villes_OSM_Mon_Commerce.png/500px-Villes_OSM_Mon_Commerce.png
Tag:amenity=charging_station,en,https://wiki.openstreetmap.org/wiki/Tag:amenity=charging_station,2025-08-29,16,1509,284,62,55.72,https://wiki.openstreetmap.org/w/images/thumb/4/4d/Recharge_Vigra_charging_station.jpg/200px-Recharge_Vigra_charging_station.jpg
Tag:amenity=charging_station,fr,https://wiki.openstreetmap.org/wiki/FR:Tag:amenity=charging_station,2024-12-28,19,2662,331,58,55.72,https://wiki.openstreetmap.org/w/images/thumb/4/4d/Recharge_Vigra_charging_station.jpg/200px-Recharge_Vigra_charging_station.jpg
1 key language url last_modified sections word_count link_count media_count staleness_score description_img_url
2 building en https://wiki.openstreetmap.org/wiki/Key:building 2025-06-10 31 3774 627 158 8.91 https://wiki.openstreetmap.org/w/images/thumb/6/61/Emptyhouse.jpg/200px-Emptyhouse.jpg
3 building fr https://wiki.openstreetmap.org/wiki/FR:Key:building 2025-05-22 25 3181 544 155 8.91 https://wiki.openstreetmap.org/w/images/thumb/6/61/Emptyhouse.jpg/200px-Emptyhouse.jpg
4 source en https://wiki.openstreetmap.org/wiki/Key:source 2025-08-12 27 2752 314 42 113.06 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
5 source fr https://wiki.openstreetmap.org/wiki/FR:Key:source 2024-02-07 23 2593 230 35 113.06 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
6 highway en https://wiki.openstreetmap.org/wiki/Key:highway 2025-04-10 30 4126 780 314 20.35 https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Roads_in_Switzerland_%2827965437018%29.jpg/200px-Roads_in_Switzerland_%2827965437018%29.jpg
7 highway fr https://wiki.openstreetmap.org/wiki/FR:Key:highway 2025-01-05 30 4141 695 313 20.35 https://upload.wikimedia.org/wikipedia/commons/thumb/7/78/Roads_in_Switzerland_%2827965437018%29.jpg/200px-Roads_in_Switzerland_%2827965437018%29.jpg
8 addr:housenumber en https://wiki.openstreetmap.org/wiki/Key:addr:housenumber 2025-07-24 11 330 97 20 14.01 https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Ferry_Street%2C_Portaferry_%2809%29%2C_October_2009.JPG/200px-Ferry_Street%2C_Portaferry_%2809%29%2C_October_2009.JPG
9 addr:housenumber fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:housenumber 2025-08-23 15 1653 150 77 14.01 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
10 addr:street en https://wiki.openstreetmap.org/wiki/Key:addr:street 2024-10-29 12 602 101 16 66.04 https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/UK_-_London_%2830474933636%29.jpg/200px-UK_-_London_%2830474933636%29.jpg
11 addr:street fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:street 2025-08-23 15 1653 150 77 66.04 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
12 addr:city en https://wiki.openstreetmap.org/wiki/Key:addr:city 2025-07-29 15 802 105 17 9.93 https://upload.wikimedia.org/wikipedia/commons/thumb/1/18/Lillerod.jpg/200px-Lillerod.jpg
13 addr:city fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:city 2025-08-23 15 1653 150 77 9.93 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
14 name en https://wiki.openstreetmap.org/wiki/Key:name 2025-07-25 17 2196 281 82 42.39 https://upload.wikimedia.org/wikipedia/commons/thumb/6/61/Helena%2C_Montana.jpg/200px-Helena%2C_Montana.jpg
15 name fr https://wiki.openstreetmap.org/wiki/FR:Key:name 2025-01-16 21 1720 187 60 42.39 https://wiki.openstreetmap.org/w/images/3/37/Strakers.jpg
16 addr:postcode en https://wiki.openstreetmap.org/wiki/Key:addr:postcode 2024-10-29 14 382 83 11 67.11 https://upload.wikimedia.org/wikipedia/commons/thumb/0/04/Farrer_post_code.jpg/200px-Farrer_post_code.jpg
17 addr:postcode fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:postcode 2025-08-23 15 1653 150 77 67.11 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
18 natural en https://wiki.openstreetmap.org/wiki/Key:natural 2025-07-17 17 2070 535 189 22.06 https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/VocaDi-Nature%2CGeneral.jpeg/200px-VocaDi-Nature%2CGeneral.jpeg
19 natural fr https://wiki.openstreetmap.org/wiki/FR:Key:natural 2025-04-21 13 1499 455 174 22.06 https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/VocaDi-Nature%2CGeneral.jpeg/200px-VocaDi-Nature%2CGeneral.jpeg
20 surface en https://wiki.openstreetmap.org/wiki/Key:surface 2025-08-28 24 3475 591 238 264.64 https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Transportation_in_Tanzania_Traffic_problems.JPG/200px-Transportation_in_Tanzania_Traffic_problems.JPG
21 surface fr https://wiki.openstreetmap.org/wiki/FR:Key:surface 2022-02-22 13 2587 461 232 264.64 https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Transportation_in_Tanzania_Traffic_problems.JPG/200px-Transportation_in_Tanzania_Traffic_problems.JPG
22 addr:country en https://wiki.openstreetmap.org/wiki/Key:addr:country 2024-12-01 9 184 65 11 22.96 https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Europe_ISO_3166-1.svg/200px-Europe_ISO_3166-1.svg.png
23 addr:country fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:country 2025-03-25 8 187 65 11 22.96 https://upload.wikimedia.org/wikipedia/commons/thumb/8/86/Europe_ISO_3166-1.svg/200px-Europe_ISO_3166-1.svg.png
24 landuse en https://wiki.openstreetmap.org/wiki/Key:landuse 2025-03-01 17 2071 446 168 39.41 https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Changing_landuse_-_geograph.org.uk_-_1137810.jpg/200px-Changing_landuse_-_geograph.org.uk_-_1137810.jpg
25 landuse fr https://wiki.openstreetmap.org/wiki/FR:Key:landuse 2024-08-20 19 2053 418 182 39.41 https://upload.wikimedia.org/wikipedia/commons/thumb/d/d3/Changing_landuse_-_geograph.org.uk_-_1137810.jpg/200px-Changing_landuse_-_geograph.org.uk_-_1137810.jpg
26 power en https://wiki.openstreetmap.org/wiki/Key:power 2025-02-28 20 641 127 21 124.89 https://wiki.openstreetmap.org/w/images/thumb/0/01/Power-tower.JPG/200px-Power-tower.JPG
27 power fr https://wiki.openstreetmap.org/wiki/FR:Key:power 2023-06-27 14 390 105 25 124.89 https://wiki.openstreetmap.org/w/images/thumb/0/01/Power-tower.JPG/200px-Power-tower.JPG
28 waterway en https://wiki.openstreetmap.org/wiki/Key:waterway 2025-03-10 21 1830 365 118 77.94 https://wiki.openstreetmap.org/w/images/thumb/f/fe/450px-Marshall-county-indiana-yellow-river.jpg/200px-450px-Marshall-county-indiana-yellow-river.jpg
29 waterway fr https://wiki.openstreetmap.org/wiki/FR:Key:waterway 2024-03-08 18 1291 272 113 77.94 https://wiki.openstreetmap.org/w/images/thumb/f/fe/450px-Marshall-county-indiana-yellow-river.jpg/200px-450px-Marshall-county-indiana-yellow-river.jpg
30 building:levels en https://wiki.openstreetmap.org/wiki/Key:building:levels 2025-08-13 16 1351 204 25 76.11 https://wiki.openstreetmap.org/w/images/thumb/4/47/Building-levels.png/200px-Building-levels.png
31 building:levels fr https://wiki.openstreetmap.org/wiki/FR:Key:building:levels 2024-08-01 15 1457 202 26 76.11 https://wiki.openstreetmap.org/w/images/thumb/4/47/Building-levels.png/200px-Building-levels.png
32 amenity en https://wiki.openstreetmap.org/wiki/Key:amenity 2025-08-24 29 3066 915 504 160.78 https://wiki.openstreetmap.org/w/images/thumb/a/a5/Mapping-Features-Parking-Lot.png/200px-Mapping-Features-Parking-Lot.png
33 amenity fr https://wiki.openstreetmap.org/wiki/FR:Key:amenity 2023-07-19 22 2146 800 487 160.78 https://wiki.openstreetmap.org/w/images/thumb/a/a5/Mapping-Features-Parking-Lot.png/200px-Mapping-Features-Parking-Lot.png
34 barrier en https://wiki.openstreetmap.org/wiki/Key:barrier 2025-04-15 17 2137 443 173 207.98 https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg/200px-2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg
35 barrier fr https://wiki.openstreetmap.org/wiki/FR:Key:barrier 2022-08-16 15 542 103 18 207.98 https://upload.wikimedia.org/wikipedia/commons/thumb/4/4c/2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg/200px-2014_Bystrzyca_K%C5%82odzka%2C_mury_obronne_05.jpg
36 source:date en https://wiki.openstreetmap.org/wiki/Key:source:date 2023-04-01 11 395 75 10 22.47 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
37 source:date fr https://wiki.openstreetmap.org/wiki/FR:Key:source:date 2023-07-21 10 419 75 11 22.47 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
38 service en https://wiki.openstreetmap.org/wiki/Key:service 2025-03-16 22 1436 218 17 83.79 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
39 service fr https://wiki.openstreetmap.org/wiki/FR:Key:service 2024-03-04 11 443 100 10 83.79 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
40 addr:state en https://wiki.openstreetmap.org/wiki/Key:addr:state 2023-06-23 12 289 74 11 100 https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/WVaCent.jpg/200px-WVaCent.jpg
41 access en https://wiki.openstreetmap.org/wiki/Key:access 2025-08-06 31 5803 708 98 66.75 https://wiki.openstreetmap.org/w/images/5/5e/WhichAccess.png
42 access fr https://wiki.openstreetmap.org/wiki/FR:Key:access 2024-11-27 33 3200 506 83 66.75 https://wiki.openstreetmap.org/w/images/5/5e/WhichAccess.png
43 oneway en https://wiki.openstreetmap.org/wiki/Key:oneway 2025-07-17 28 2318 290 30 19.4 https://upload.wikimedia.org/wikipedia/commons/thumb/1/13/One_way_sign.JPG/200px-One_way_sign.JPG
44 oneway fr https://wiki.openstreetmap.org/wiki/FR:Key:oneway 2025-06-16 14 645 108 14 19.4 https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/France_road_sign_C12.svg/200px-France_road_sign_C12.svg.png
45 height en https://wiki.openstreetmap.org/wiki/Key:height 2025-07-21 24 1184 184 20 8.45 https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Height_demonstration_diagram.png/200px-Height_demonstration_diagram.png
46 height fr https://wiki.openstreetmap.org/wiki/FR:Key:height 2025-06-14 21 1285 190 21 8.45 https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Height_demonstration_diagram.png/200px-Height_demonstration_diagram.png
47 ref en https://wiki.openstreetmap.org/wiki/Key:ref 2025-07-25 26 4404 782 115 11.79 https://upload.wikimedia.org/wikipedia/commons/thumb/3/3d/UK_traffic_sign_2901.svg/200px-UK_traffic_sign_2901.svg.png
48 ref fr https://wiki.openstreetmap.org/wiki/FR:Key:ref 2025-07-30 20 3393 460 12 11.79 https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/Autoroute_fran%C3%A7aise_1.svg/200px-Autoroute_fran%C3%A7aise_1.svg.png
49 maxspeed en https://wiki.openstreetmap.org/wiki/Key:maxspeed 2025-08-20 30 4275 404 38 39.24 https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg/200px-Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg.png
50 maxspeed fr https://wiki.openstreetmap.org/wiki/FR:Key:maxspeed 2025-05-10 25 1401 156 23 39.24 https://upload.wikimedia.org/wikipedia/commons/thumb/6/63/Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg/200px-Zeichen_274-60_-_Zul%C3%A4ssige_H%C3%B6chstgeschwindigkeit%2C_StVO_2017.svg.png
51 lanes en https://wiki.openstreetmap.org/wiki/Key:lanes 2025-08-21 26 2869 355 48 117.16 https://upload.wikimedia.org/wikipedia/commons/thumb/f/f4/A55_trunk_road_looking_east_-_geograph.org.uk_-_932668.jpg/200px-A55_trunk_road_looking_east_-_geograph.org.uk_-_932668.jpg
52 lanes fr https://wiki.openstreetmap.org/wiki/FR:Key:lanes 2024-03-07 19 1492 167 19 117.16 https://wiki.openstreetmap.org/w/images/thumb/d/d4/Dscf0444_600.jpg/200px-Dscf0444_600.jpg
53 start_date en https://wiki.openstreetmap.org/wiki/Key:start_date 2025-08-01 22 1098 168 29 214.58 https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Connel_bridge_plate.jpg/200px-Connel_bridge_plate.jpg
54 start_date fr https://wiki.openstreetmap.org/wiki/FR:Key:start_date 2022-08-29 19 1097 133 22 214.58 https://upload.wikimedia.org/wikipedia/commons/thumb/d/dc/Connel_bridge_plate.jpg/200px-Connel_bridge_plate.jpg
55 addr:district en https://wiki.openstreetmap.org/wiki/Key:addr:district 2023-11-06 11 244 76 11 139.96 https://upload.wikimedia.org/wikipedia/commons/thumb/d/d0/Hangal_Taluk.jpg/200px-Hangal_Taluk.jpg
56 addr:district fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:district 2025-08-23 15 1653 150 77 139.96 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
57 layer en https://wiki.openstreetmap.org/wiki/Key:layer 2025-01-02 16 1967 181 17 65.95 https://wiki.openstreetmap.org/w/images/thumb/2/26/Washington_layers.png/200px-Washington_layers.png
58 layer fr https://wiki.openstreetmap.org/wiki/FR:Key:layer 2024-02-16 15 2231 162 17 65.95 https://wiki.openstreetmap.org/w/images/thumb/2/26/Washington_layers.png/200px-Washington_layers.png
59 type en https://wiki.openstreetmap.org/wiki/Key:type 2025-05-13 20 911 200 72 334.06 https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
60 type fr https://wiki.openstreetmap.org/wiki/FR:Key:type 2020-11-13 10 444 78 10 334.06 https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
61 operator en https://wiki.openstreetmap.org/wiki/Key:operator 2025-08-26 24 1908 241 37 223.28 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
62 operator fr https://wiki.openstreetmap.org/wiki/FR:Key:operator 2022-09-30 15 418 89 11 223.28 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
63 lit en https://wiki.openstreetmap.org/wiki/Key:lit 2024-07-20 17 931 174 52 38.88 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e2/Peatonal_Bicentenario.JPG/200px-Peatonal_Bicentenario.JPG
64 lit fr https://wiki.openstreetmap.org/wiki/FR:Key:lit 2025-01-19 17 628 123 14 38.88 https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/2014_K%C5%82odzko%2C_ul._Grottgera_14.JPG/200px-2014_K%C5%82odzko%2C_ul._Grottgera_14.JPG
65 wall en https://wiki.openstreetmap.org/wiki/Key:wall 2024-05-02 14 682 206 61 100 https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
66 tiger:cfcc en https://wiki.openstreetmap.org/wiki/Key:tiger:cfcc 2022-12-09 10 127 24 7 100 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
67 crossing en https://wiki.openstreetmap.org/wiki/Key:crossing 2024-02-18 25 2678 363 34 76.98 https://wiki.openstreetmap.org/w/images/thumb/7/75/Toucan.jpg/200px-Toucan.jpg
68 crossing fr https://wiki.openstreetmap.org/wiki/FR:Key:crossing 2025-01-20 15 1390 254 28 76.98 https://wiki.openstreetmap.org/w/images/thumb/7/75/Toucan.jpg/200px-Toucan.jpg
69 tiger:county en https://wiki.openstreetmap.org/wiki/Key:tiger:county 2022-12-09 10 127 24 7 100 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
70 source:addr en https://wiki.openstreetmap.org/wiki/Key:source:addr 2023-07-05 9 200 70 10 100 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
71 footway en https://wiki.openstreetmap.org/wiki/Key:footway 2025-08-20 23 2002 369 39 99.66 https://wiki.openstreetmap.org/w/images/thumb/b/b9/Sidewalk_and_zebra-crossing.jpg/200px-Sidewalk_and_zebra-crossing.jpg
72 footway fr https://wiki.openstreetmap.org/wiki/FR:Key:footway 2024-06-04 14 685 147 28 99.66 https://wiki.openstreetmap.org/w/images/thumb/b/b9/Sidewalk_and_zebra-crossing.jpg/200px-Sidewalk_and_zebra-crossing.jpg
73 ref:bag en https://wiki.openstreetmap.org/wiki/Key:ref:bag 2024-10-09 10 254 69 11 100 https://wiki.openstreetmap.org/w/images/thumb/5/58/Osm_element_node_no.svg/30px-Osm_element_node_no.svg.png
74 addr:place en https://wiki.openstreetmap.org/wiki/Key:addr:place 2025-03-28 16 1204 154 13 136.57 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Suburb_of_Phillip.jpg/200px-Suburb_of_Phillip.jpg
75 addr:place fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:place 2023-06-17 11 276 75 12 136.57 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Suburb_of_Phillip.jpg/200px-Suburb_of_Phillip.jpg
76 tiger:reviewed en https://wiki.openstreetmap.org/wiki/Key:tiger:reviewed 2025-08-01 16 734 105 11 100 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/US-Census-TIGERLogo.svg/200px-US-Census-TIGERLogo.svg.png
77 leisure en https://wiki.openstreetmap.org/wiki/Key:leisure 2025-02-28 12 1084 374 180 232.43 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Hammock_-_Polynesia.jpg/200px-Hammock_-_Polynesia.jpg
78 leisure fr https://wiki.openstreetmap.org/wiki/FR:Key:leisure 2021-12-29 11 951 360 186 232.43 https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Hammock_-_Polynesia.jpg/200px-Hammock_-_Polynesia.jpg
79 addr:suburb en https://wiki.openstreetmap.org/wiki/Key:addr:suburb 2024-02-24 14 439 89 11 1.49 https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Grosvenor_Place_2_2008_06_19.jpg/200px-Grosvenor_Place_2_2008_06_19.jpg
80 addr:suburb fr https://wiki.openstreetmap.org/wiki/FR:Key:addr:suburb 2024-02-18 13 418 87 11 1.49 https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Grosvenor_Place_2_2008_06_19.jpg/200px-Grosvenor_Place_2_2008_06_19.jpg
81 ele en https://wiki.openstreetmap.org/wiki/Key:ele 2025-07-18 18 1846 165 24 104.45 https://wiki.openstreetmap.org/w/images/a/a3/Key-ele_mapnik.png
82 ele fr https://wiki.openstreetmap.org/wiki/FR:Key:ele 2024-03-02 15 1277 128 13 104.45 https://wiki.openstreetmap.org/w/images/a/a3/Key-ele_mapnik.png
83 tracktype en https://wiki.openstreetmap.org/wiki/Key:tracktype 2024-12-02 16 652 146 35 32.71 https://wiki.openstreetmap.org/w/images/thumb/1/13/Tracktype-collage.jpg/200px-Tracktype-collage.jpg
84 tracktype fr https://wiki.openstreetmap.org/wiki/FR:Key:tracktype 2025-05-03 11 463 105 29 32.71 https://wiki.openstreetmap.org/w/images/thumb/1/13/Tracktype-collage.jpg/200px-Tracktype-collage.jpg
85 addr:neighbourhood en https://wiki.openstreetmap.org/wiki/Key:addr:neighbourhood 2025-04-29 24 2020 235 83 100 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
86 addr:hamlet en https://wiki.openstreetmap.org/wiki/Key:addr:hamlet 2024-12-05 9 142 64 11 100 https://upload.wikimedia.org/wikipedia/commons/thumb/b/bb/Grosvenor_Place_2_2008_06_19.jpg/200px-Grosvenor_Place_2_2008_06_19.jpg
87 addr:province en https://wiki.openstreetmap.org/wiki/Key:addr:province 2022-05-04 9 156 64 11 100 https://upload.wikimedia.org/wikipedia/commons/thumb/4/4b/Stamp_of_Indonesia_-_2002_-_Colnect_265917_-_Aceh_Province.jpeg/200px-Stamp_of_Indonesia_-_2002_-_Colnect_265917_-_Aceh_Province.jpeg
88 leaf_type en https://wiki.openstreetmap.org/wiki/Key:leaf_type 2025-01-22 15 739 201 57 114.46 https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/Picea_abies_Nadelkissen.jpg/200px-Picea_abies_Nadelkissen.jpg
89 leaf_type fr https://wiki.openstreetmap.org/wiki/FR:Key:leaf_type 2023-07-02 14 734 220 64 114.46 https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/Picea_abies_Nadelkissen.jpg/200px-Picea_abies_Nadelkissen.jpg
90 addr:full en https://wiki.openstreetmap.org/wiki/Key:addr:full 2025-04-29 24 2020 235 83 100 https://wiki.openstreetmap.org/w/images/thumb/e/e9/Housenumber-karlsruhe-de.png/200px-Housenumber-karlsruhe-de.png
91 Anatomie_des_étiquettes_osm en https://wiki.openstreetmap.org/wiki/Anatomie_des_étiquettes_osm 2025-06-08 22 963 53 0 100
92 Tag:leisure=children_club en https://wiki.openstreetmap.org/wiki/Tag:leisure=children_club 2025-02-02 9 163 69 9 56.04 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
93 Tag:leisure=children_club fr https://wiki.openstreetmap.org/wiki/FR:Tag:leisure=children_club 2024-05-02 8 294 67 10 56.04 https://upload.wikimedia.org/wikipedia/commons/thumb/7/74/Dave_%26_Buster%27s_video_arcade_in_Columbus%2C_OH_-_17910.JPG/200px-Dave_%26_Buster%27s_video_arcade_in_Columbus%2C_OH_-_17910.JPG
94 Tag:harassment_prevention=ask_angela en https://wiki.openstreetmap.org/wiki/Tag:harassment_prevention=ask_angela 2025-02-22 14 463 72 9 42.56 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
95 Tag:harassment_prevention=ask_angela fr https://wiki.openstreetmap.org/wiki/FR:Tag:harassment_prevention=ask_angela 2025-09-01 20 873 166 15 42.56 https://wiki.openstreetmap.org/w/images/thumb/1/15/2024-06-27T08.40.50_ask_angela_lyon.jpg/200px-2024-06-27T08.40.50_ask_angela_lyon.jpg
96 Key:harassment_prevention en https://wiki.openstreetmap.org/wiki/Key:harassment_prevention 2024-08-10 12 196 69 14 66.72 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
97 Key:harassment_prevention fr https://wiki.openstreetmap.org/wiki/FR:Key:harassment_prevention 2025-07-03 15 328 83 14 66.72 https://wiki.openstreetmap.org/w/images/thumb/7/76/Osm_element_node.svg/30px-Osm_element_node.svg.png
98 Proposal process en https://wiki.openstreetmap.org/wiki/Proposal process 2025-08-13 46 5292 202 4 166.25 https://wiki.openstreetmap.org/w/images/thumb/c/c2/Save_proposal_first.png/761px-Save_proposal_first.png
99 Proposal process fr https://wiki.openstreetmap.org/wiki/FR:Proposal process 2023-09-22 15 1146 24 0 166.25
100 Automated_Edits_code_of_conduct en https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct 2025-07-26 19 2062 69 0 26.35
101 Automated_Edits_code_of_conduct fr https://wiki.openstreetmap.org/wiki/FR:Automated_Edits_code_of_conduct 2025-04-03 17 1571 16 0 26.35
102 Key:cuisine en https://wiki.openstreetmap.org/wiki/Key:cuisine 2025-07-23 17 3422 693 303 107.73 https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Food_montage.jpg/200px-Food_montage.jpg
103 Key:cuisine fr https://wiki.openstreetmap.org/wiki/FR:Key:cuisine 2024-02-16 15 2866 690 316 107.73 https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Food_montage.jpg/200px-Food_montage.jpg
104 Libre_Charge_Map en https://wiki.openstreetmap.org/wiki/Libre_Charge_Map 2025-07-28 11 328 10 2 100 https://wiki.openstreetmap.org/w/images/thumb/8/8e/Screenshot_2025-07-28_at_14-40-11_LibreChargeMap_-_OSM_Bliss.png/300px-Screenshot_2025-07-28_at_14-40-11_LibreChargeMap_-_OSM_Bliss.png
105 OSM_Mon_Commerce en https://wiki.openstreetmap.org/wiki/OSM_Mon_Commerce 2025-07-29 17 418 34 3 100 https://wiki.openstreetmap.org/w/images/thumb/6/67/Villes_OSM_Mon_Commerce.png/500px-Villes_OSM_Mon_Commerce.png
106 Tag:amenity=charging_station en https://wiki.openstreetmap.org/wiki/Tag:amenity=charging_station 2025-08-29 16 1509 284 62 55.72 https://wiki.openstreetmap.org/w/images/thumb/4/4d/Recharge_Vigra_charging_station.jpg/200px-Recharge_Vigra_charging_station.jpg
107 Tag:amenity=charging_station fr https://wiki.openstreetmap.org/wiki/FR:Tag:amenity=charging_station 2024-12-28 19 2662 331 58 55.72 https://wiki.openstreetmap.org/w/images/thumb/4/4d/Recharge_Vigra_charging_station.jpg/200px-Recharge_Vigra_charging_station.jpg