Shortcuts: WD:RAQ, w.wiki/LX

Wikidata:Request a query: Difference between revisions

From Wikidata
Jump to navigation Jump to search
Content deleted Content added
→‎Room Clean: new section
Tags: Reverted New topic
Revibot I (talk | contribs)
m Bot: Archiving 2 threads (older than 14 days) to Wikidata:Request a query/Archive/2024/06
(38 intermediate revisions by 11 users not shown)
Line 10: Line 10:
|archiveheader = {{Archive}}
|archiveheader = {{Archive}}
}}
}}

== VALUES slows down query hugely ==

{{SPARQL |query=
SELECT DISTINCT ?item WHERE {

VALUES ?classes {
wd:Q1785071 # fort
wd:Q56344492 # artillery battery
}

?item wdt:P31/wdt:P279* ?classes.

?item wdt:P17 wd:Q145.
}
}}
Takes 30 s, while
{{SPARQL|query=
SELECT DISTINCT ?item WHERE {

VALUES ?classes {
wd:Q1785071 # fort
# wd:Q56344492 # artillery battery
}

?item wdt:P31/wdt:P279* ?classes.

?item wdt:P17 wd:Q145.
}
}}

Takes 0.7 s. Obviously for this trivial case I could use UNION, but in my real case I want to parameterise with a VALUES array. [[User:Vicarage|Vicarage]] ([[User talk:Vicarage|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 21:54, 25 June 2024 (UTC)

:Some sort of optimisation fail, I fear. Oddly, implementng VALUES in a named subquery seems to get over the problem.
:{{SPARQL|query=SELECT DISTINCT ?item WITH {
SELECT ?classes where {
VALUES ?classes {
wd:Q1785071 # fort
wd:Q56344492 # artillery battery
}
} } as %i
WHERE {
INCLUDE %i
?item wdt:P31/wdt:P279* ?classes.
?item wdt:P17 wd:Q145.
}
}} --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 01:54, 26 June 2024 (UTC)

Making progress, but have 2 issues.

1) I'd like to only do the distance check if ?range1 were declared

2) I get a "bad aggregate" when trying to combine ?type1 and ?type2 preferentially

{{SPARQL|query=SELECT DISTINCT ?item ?itemLabel (COALESCE(SAMPLE (DISTINCT ?type1), SAMPLE (DISTINCT ?type2)) AS ?type)
WITH {
SELECT ?instances ?classes (IF(BOUND(?range1),?range1,"100") AS ?range) WHERE {
#### THIS BIT IN INCLUDED FILE, MANY EXAMPLES ####
VALUES ?instances {
wd:Q2772772 # military museum
}
VALUES ?classes {
wd:Q1785071 # fort
wd:Q91122 # bunker
}
VALUES ?range1 {"30"}
#### END ####
} } AS %i
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en-GB,en,fr,de,es,pt,pl,nl,cs". }
{wd:Q1180981 wdt:P625 ?loc1} UNION {wd:Q1180981 wdt:P276 ?loc. ?loc wdt:P625 ?loc1}
?item wdt:P17 wd:Q145.
{INCLUDE %i}
{?item wdt:P31/wdt:P279* ?classes. BIND(?classes as ?type2)}
UNION {?item wdt:P31 ?instances. BIND(?instances AS ?type1)}
SERVICE wikibase:around {
?item wdt:P625 ?loc2 .
bd:serviceParam wikibase:center ?loc1 .
bd:serviceParam wikibase:radius ?range .
bd:serviceParam wikibase:distance ?dist.
}
} GROUP BY ?item ?itemLabel
}}

::{{ping|Vicarage}} The aggretation issue is sorted ut in the query above. The final GROUP BY compliments the initial SELECT, so ?itemLabel was required. Not sure where you're going with range: ?range1 is always BOUND in the first, %i, named subquery b/c it is being set by VALUES. The initial SELECT does not select ?range (or ?range1) so you're not getting the range from the lower query. And then, ?range 1 does not seem to be used in the second query at all, and all of the ?items found in the second query will have a ?range b/c they're being found by the wikibase:around function. So. Have a think, let me know. --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 13:29, 26 June 2024 (UTC)

:::{{ping|Tagishsimon}} Thanks for the aggregate solution. I think I'm trying to be too clever, writing a generic back end that can filter on combinations of classes, distances and properties provided in a short source file, with the logic that if the filter values are mentioned, use them, otherwise skip the check entirely, to find items of many different types that are related to an item (geographically, because of subclass, because they are linked by a property of one item that is in the other's database) This requires the programatic 'IF ?thing not blank use ?thing in this check, otherwise skip the check' which doesn't seem available in the language. I think I will need to retreat to a wider range of separate back end queries. [[User:Vicarage|Vicarage]] ([[User talk:Vicarage|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 13:54, 26 June 2024 (UTC)

::::{{ping|Vicarage}} BIND(IF(BOUND(?this),?this,?that) as ?whatever), or BIND(IF(!BOUND(?something_else),?this,?that) as ?whatever) or BIND(IF(?this<30),?this,?that) as ?whatever) sounds like what you may be after ... so you're not deciding whether or not to do the distance calculation, but rather deciding based on some condition whether to use the result? --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 21:56, 26 June 2024 (UTC)
:::::Yes, but I can't do the distance calculation if the item doesn't have a P625, or check against a list of properties if the list is blank. Each time I come up with a generic solution it either won't accept nulls, or runs unreasonably slowly. Its very frustrating that a few dozen results can time out. [[User:Vicarage|Vicarage]] ([[User talk:Vicarage|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 22:17, 26 June 2024 (UTC)
::::::For example, this, which uses 2 techniques you showed me, times out
{{SPARQL|query=
SELECT DISTINCT ?item ?type
(SAMPLE (DISTINCT ?sta) as ?subtypeLabel)
WITH {SELECT ?classes1# ?classes2 #?range ?props
WHERE {
VALUES ?classes1 {
wd:Q1785071 # fort
}
VALUES ?classes2 {
wd:Q91122 # bunker
}
VALUES ?range {"15"}
} } AS %i
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en-GB,en,fr,de,es,pt,pl,nl,cs". }
{INCLUDE %i}
wd:Q1180981 wdt:P17 ?country.
?item wdt:P17 ?country.
{?item wdt:P31/wdt:P279* ?classes1}
UNION {?item wdt:P31/wdt:P279* ?classes2}
{wd:Q1180981 wdt:P625 ?loc1} UNION {wd:Q1180981 wdt:P276 ?loc. ?loc wdt:P625 ?loc1}
SERVICE wikibase:around {
?item wdt:P625 ?loc2.
bd:serviceParam wikibase:center ?loc1.
bd:serviceParam wikibase:radius "15".
bd:serviceParam wikibase:distance ?dist.
}
MINUS {VALUES ?item {wd:Q1180981} } # itself
OPTIONAL {?item wdt:P1448 ?labellist. FILTER (lang(?labellist) = "mul")} # official name
{SERVICE wikibase:label {bd:serviceParam wikibase:language "en" . ?item rdfs:label ?label1} }
}
#GROUP by ?item ?starts ?ends ?dist ?label1 ?type
GROUP by ?item ?type ?label1
}} [[User:Vicarage|Vicarage]] ([[User talk:Vicarage|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 22:25, 26 June 2024 (UTC)

:{{ping|Vicarage}} SPARQL will be the death of us. I made two changes: uncommented ?classes2 in the top query b/c otherwise that variable is unbound in the second query which would mean it was asking for a P31/P279* of everything; and then added a runfirst hint to the code which fetches ?loc1. Now 1.2 seconds runtime.

:I stripped down the query and then added clauses to find out when it slowed down. I played with the runfirst b/c the sooner the number of possible values of a variable such as ?item can be restricted, the better. The query is making two calls to the label service, which is probably a bad idea. There may be scope for further hints down the line; particularly <code>hint:Prior hint:gearing "forward".</code> after each P31/P279* clause, within their {braces}, so the optimiser works from the ?item to the ?classes1 and ?classes2 rather than from ?classes1 to the ?item.

:{{SPARQL|query=SELECT DISTINCT ?item ?type (SAMPLE (DISTINCT ?sta) as ?subtypeLabel) WITH {
SELECT ?classes1 ?classes2 #?range ?props
WHERE {
VALUES ?classes1 {
wd:Q1785071 # fort
}
VALUES ?classes2 {
wd:Q91122 # bunker
}
VALUES ?range {"15"}
} } AS %i
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en-GB,en,fr,de,es,pt,pl,nl,cs". }
INCLUDE %i
wd:Q1180981 wdt:P17 ?country.
?item wdt:P17 ?country.
{?item wdt:P31/wdt:P279* ?classes1 .} UNION
{?item wdt:P31/wdt:P279* ?classes2 .}
{ {wd:Q1180981 wdt:P625 ?loc1.} UNION
{wd:Q1180981 wdt:P276 ?loc.
?loc wdt:P625 ?loc1} } hint:Prior hint:runFirst true.
SERVICE wikibase:around {
?item wdt:P625 ?loc2.
bd:serviceParam wikibase:center ?loc1.
bd:serviceParam wikibase:radius "15".
bd:serviceParam wikibase:distance ?dist.
}
MINUS {VALUES ?item {wd:Q1180981} } # itself
OPTIONAL {?item wdt:P1448 ?labellist. FILTER (lang(?labellist) = "mul")} # official name
{SERVICE wikibase:label {bd:serviceParam wikibase:language "en" . ?item rdfs:label ?label1} }
}
#GROUP by ?item ?starts ?ends ?dist ?label1 ?type
GROUP by ?item ?type ?label1
}} --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 22:56, 26 June 2024 (UTC)
::Argh, how could it be so dumb to not find a location before using it! Each time I prepare example code I strip out all the variants and hints I tried out, so commented out sections get through. It still slows if you multiply up the number of ?classes1 entries, but with judicial pruning, I've managed to get the Dreadnought which defeated us last week to under a minute. [[User:Vicarage|Vicarage]] ([[User talk:Vicarage|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 06:35, 27 June 2024 (UTC)

== Books with an argentinian author ==

Hi! I was trying to build a query to obtain works (p31 = Q3331189) with an author (P50) whose nationality is argentinian (P27 = Q414). I believe i should use UNION..but im kind of lost, maybe with FIlTER? I appreciate any help :) [[User:Mauricio V. Genta|Mauricio V. Genta]] ([[User talk:Mauricio V. Genta|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 16:43, 26 June 2024 (UTC)

:{{ping|Mauricio V. Genta}} Right now, you'd be wanting something like this. Come back & tell me if you want the query to head in any other direction.

:{{SPARQL|query=SELECT ?author ?authorLabel ?edition ?editionLabel WHERE {

?author wdt:P27 wd:Q414. # Argentinian author
?edition wdt:P50 ?author . # is an author of something
?edition wdt:P31 wd:Q3331189 . # something is an edition, translation &c.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],es,en". }
}
}} --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 21:48, 26 June 2024 (UTC)
::Thanks! It was quite simple, i have to overthink less. [[User:Mauricio V. Genta|Mauricio V. Genta]] ([[User talk:Mauricio V. Genta|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 03:22, 27 June 2024 (UTC)

== IPNI but no Wikispecies article ==

Please can we have a query to find people with a value for {{P|P586}}, but no article on Wikispecies? <span class="vcard"><span class="fn">[[User:Pigsonthewing|Andy Mabbett]]</span> (<span class="nickname">Pigsonthewing</span>); [[User talk:Pigsonthewing|Talk to Andy]]; [[Special:Contributions/Pigsonthewing|Andy's edits]]</span> 19:49, 27 June 2024 (UTC)

:{{ping|Pigsonthewing}}
:{{SPARQL|query=SELECT ?item ?itemLabel ?IPNI WHERE {
?item wdt:P586 ?IPNI .
filter not exists { ?article schema:about ?item ;
schema:isPartOf <https://species.wikimedia.org/> . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
}} --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 20:26, 27 June 2024 (UTC)

== Query for people birth details ==

Hi there! I'm trying to look for people & their birth details, like time & place. As the results for that query are of course too big, and times out, I tried adding some extra "filters", like being females or having an image. But even then, it's still failing. Any ideas to divide this into parts that I can later join to get the whole set? Here's a current query example (that include people article's URL for Spanish):
:{{SPARQL|query=PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT
DISTINCT
?item ?itemLabel ?itemDescription ?wdimage # WHAT
?Sdate ?SdatePrecision # WHEN
?where ?whereLabel # WHERE
?articleUrl
WHERE {
?item wdt:P31 wd:Q5.
?item wdt:P21 wd:Q6581072. # female
?item wdt:P18 ?wdimage.
?item wdt:P19 ?where.
?item p:P569/psv:P569 ?SdateNode.
?SdateNode wikibase:timeValue ?Sdate.
?SdateNode wikibase:timePrecision ?SdatePrecision.
OPTIONAL { ?articleUrl schema:about ?item .
?articleUrl schema:inLanguage "es" .
?articleUrl schema:isPartOf <https://es.wikipedia.org/> . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "es" }
}
}} [[User:Pruna.ar|Pruna.ar]] ([[User talk:Pruna.ar|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 21:56, 29 June 2024 (UTC)

:{{ping|Pruna.ar}} Blazegraph, Wikidata's current report engine, has a slice service allowing the user to iterate through complete sets of triples, using an offset and limit pair of parameters to identify where in the set index to start, and how many triples to go through. So for your query, something like the below would be the way to go. You'll have to experiment with the two parameters; you can remove other clauses which you may have introduced only to cut down the number of qualifying items.

:{{SPARQL|query=
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT
DISTINCT
?item ?itemLabel ?itemDescription ?wdimage # WHAT
?Sdate ?SdatePrecision # WHEN
?where ?whereLabel # WHERE
?articleUrl
WHERE {
SERVICE bd:slice {
?item p:P569 [].
bd:serviceParam bd:slice.offset 0 . # Start at item number (not to be confused with QID)
bd:serviceParam bd:slice.limit 100000 . # List this many items
}
?item wdt:P31 wd:Q5.
?item wdt:P21 wd:Q6581072. # female
?item wdt:P18 ?wdimage.
?item wdt:P19 ?where.
?item p:P569/psv:P569 ?SdateNode.
?SdateNode wikibase:timeValue ?Sdate.
?SdateNode wikibase:timePrecision ?SdatePrecision.
OPTIONAL { ?articleUrl schema:about ?item .
?articleUrl schema:inLanguage "es" .
?articleUrl schema:isPartOf <https://es.wikipedia.org/> . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "es" }
}
}} --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 23:08, 29 June 2024 (UTC)


== Speeding up a query of the heritage properties of castles ==
== Speeding up a query of the heritage properties of castles ==
Line 503: Line 246:


I tried to make a graph of people related to some specific organisations (associations or less formal groups with {{P|P463}} and political parties with {{P|P102}}), which I will call the “first people” or people from the first criteria, as well as the relatives they have in common (with a few different family properties).<br />
I tried to make a graph of people related to some specific organisations (associations or less formal groups with {{P|P463}} and political parties with {{P|P102}}), which I will call the “first people” or people from the first criteria, as well as the relatives they have in common (with a few different family properties).<br />
The explanations are probably quite confused to read, there is a shorter FILTERing formula closer to the end of this post which makes the logic more understandable.<br />
The explanations are probably quite confusing to read, there is a shorter FILTERing formula closer to the end of this post which makes the logic more understandable.<br />
As there are hundreds of people in these groups, I want to filter on several criteria. I want people connected to at least two groups or parties, excepted if it is only to political parties and no group, as several people are only changing parties and it is not interesting to include them. I also want some of the relatives of these people; the relatives appearing should be either connected to at least two of the “first people” themselves directly connected to groups (even if these “first people” are only connected to one group or party and therefore excluded from the first criteria), or be connected to one person connected to groups + connected to one group or party themselves.
As there are hundreds of people in these groups, I want to filter on several criteria. I want people connected to at least two groups or parties, excepted if it is only to political parties and no group, as several people are only changing parties and it is not interesting to include them. I also want some of the relatives of these people; the relatives appearing should be either connected to at least two of the “first people” themselves directly connected to groups (even if these “first people” are only connected to one group or party and therefore excluded from the first criteria), or be connected to one person connected to groups + connected to one group or party themselves.


I first went with a UNION to get people linked to groups and parties, and got different COUNTs on each, which we will call ?cG and ?cP, respectively for the number of groups and the number of political parties the person is connected with.
I first went with a UNION to get people linked to groups and parties, and got different COUNTs on each, which we will call ?cG and ?cP, respectively for the number of groups and the number of political parties the person is connected with.


For the relatives’ part, I went with a subquery reproducing the first UNION, added the relatives (I filtered on the superior than 0 to eliminate a NULL value otherwise appearing), and BINDed the ?relative variable on the ?item variable in order to UNION this section with the first part and counted the connections to them with ?cR. The further FILTER operations with the COUNTs need to be done outside of the UNIONs, as it will be their different sums which will include or not the people in the final request.
For the relatives’ part, I went with a subquery reproducing the first UNION, added the relatives, and name the ?relative variable as the ?item variable in order to UNION this section with the first part and counted the connections to the relatives with ?cR. The further FILTER operations with the COUNTs will need to be done outside of the UNIONs, as it will be their different sums which will include or not the people in the final request.


However I failed to properly write the UNIONs. I succeeded getting a result by having the two first COUNTs separated from the relatives’ one, but this way it does not group/merge the items in one, and the people appearing in both subqueries get two lines. This is now my main problem, all the rest can be ignored, I should be able to continue after solving this merging problem.
However I failed to properly write the UNIONs. I succeeded getting a result by having the two first COUNTs separated from the relatives’ one, but this way it does not group/merge the items in one, and the people appearing in both subqueries get two lines. This is now my main problem, all the rest can be ignored, I should be able to continue after solving this merging problem.
Line 515: Line 258:


{{SPARQL
{{SPARQL
| query = SELECT DISTINCT ?item ?itemLabel (count (?memberOf) AS ?cG) (count (?memberOfPP) AS ?cP) ?cR WHERE {
| query = SELECT DISTINCT ?item ?itemLabel (count (?memberOf) AS ?cG) (count (?memberOfPP) AS ?cP) ?cR WHERE {
{
VALUES ?memberOf { a bunch of wd:Q… }.
{
?item wdt:P463 ?memberOf.
VALUES ?memberOf { a bunch of wd:Q… } .
}
?item wdt:P463 ?memberOf .
}
UNION
UNION
{
VALUES ?memberOfPP { another bunch of wd:Q… }.
{
?item wdt:P102 ?memberOfPP.
VALUES ?memberOfPP { another bunch of wd:Q… } .
}
?item wdt:P102 ?memberOfPP .
}
UNION
UNION
{
SELECT (?relative AS ?item) (count (?relative) AS ?cR) WHERE {
{
SELECT ?item ?cR WHERE {
SELECT DISTINCT ?item ?relative WHERE {
{
{
SELECT ?relative (count (?relative) AS ?cR) WHERE {
VALUES ?memberOf { a bunch of wd:Q… again }.
SELECT DISTINCT ?item ?relative WHERE {
?item wdt:P463 ?memberOf.
{
}
UNION
VALUES ?memberOf { a bunch of wd:Q… again } .
{
?item wdt:P463 ?memberOf .
}
VALUES ?memberOfPP { another bunch of wd:Q… again }.
UNION
?item wdt:P102 ?memberOfPP.
{
}
?relative ^(wdt:P22{{!}}wdt:P25{{!}}wdt:P26{{!}}wdt:P40{{!}}wdt:P451{{!}}wdt:P1038{{!}}wdt:P3373{{!}}wdt:P3448) ?item.
VALUES ?memberOfPP { another bunch of wd:Q… again } .
}
?item wdt:P102 ?memberOfPP .
}
} GROUP BY ?relative
OPTIONAL { ?item (wdt:P22{{!}}wdt:P25{{!}}wdt:P26{{!}}wdt:P40{{!}}wdt:P451{{!}}wdt:P1038{{!}}wdt:P3373{{!}}wdt:P3448) ?relative }.
}
} GROUP BY ?relative HAVING (?cR > 0) }
BIND (?relative AS ?item).
}
}
}
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en" }
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en". }
} GROUP BY ?item ?itemLabel ?cR
<nowiki>}</nowiki>
}}
}}


Line 559: Line 298:
Thanks in advance! --[[User:Bischnu|Bischnu]] ([[User talk:Bischnu|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 23:33, 7 July 2024 (UTC)
Thanks in advance! --[[User:Bischnu|Bischnu]] ([[User talk:Bischnu|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 23:33, 7 July 2024 (UTC)


:After letting the query aside for a few days, I succeeded in getting what I wanted. The correct query is:
== Room Clean ==


:{{SPARQL
Cleaned Room [[Special:Contributions/2603:9001:5F00:7BD8:89F7:3EDD:ED98:35D9|2603:9001:5F00:7BD8:89F7:3EDD:ED98:35D9]] 15:11, 8 July 2024 (UTC)
| query = SELECT DISTINCT ?item ?itemLabel (count (?memberOf) AS ?cG) (count (?memberOfPP) AS ?cP) (count (?source) AS ?cR) WHERE {
{
VALUES ?memberOf { a bunch of wd:Q… }.
?item wdt:P463 ?memberOf.
}
UNION
{
VALUES ?memberOfPP { another bunch of wd:Q… }.
?item wdt:P102 ?memberOfPP.
}
UNION
{
SELECT DISTINCT ?source ?item WHERE {
{
VALUES ?memberOf { a bunch of wd:Q… again }.
?source wdt:P463 ?memberOf.
}
UNION
{
VALUES ?memberOfPP { another bunch of wd:Q… again }.
?source wdt:P102 ?memberOfPP.
}
?item ^(wdt:P22{{!}}wdt:P25{{!}}wdt:P26{{!}}wdt:P40{{!}}wdt:P451{{!}}wdt:P1038{{!}}wdt:P3373{{!}}wdt:P3448) ?source.
}
}
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en". }
} GROUP BY ?item ?itemLabel ?cR
}}

== Agents linked to an instition (collection) ==

Hi,
I am trying to query and then visualize the people, i.e. staff members, collection agents and more connected to my institution which houses a large collection (http://www.wikidata.org/entity/Q233098).

The people are connected via:
P108: employer
P1416: affiliation
P485: archives at
P11146: collection items at

I have created this easy query but then the people are not linked...
https://w.wiki/AcpG
Since the Archive and Herbarium are part of the main institution, all people are connected to [[Q233098]]. Some are connected with each other (via the property 'significant person' as co-authors or co-collectors, some in academic relations (doctoral adivisor, doctoral student) but not so many yet. More work needs to be done.
Occupations could maybe be useful to group the agents.

Any suggestions how best to visualize this in a graph of people connected to the museum?
Thank you! [[User:S.v.Mering|S.v.Mering]] ([[User talk:S.v.Mering|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 13:04, 10 July 2024 (UTC)

:Hello,
:From what I know, in order to visualise the museums and links to the museums, you have to add the column with the museum in the SELECT, so each row would be a combination of a person and the museum it is linked to.<br />Here is an example based on your query: https://w.wiki/Aczp, which is not very elegant as I let it in different UNIONs and different variables. I did not know if you wanted to keep the exact result you already had (so the employer and affiliation properties only applies to the first museum, the archives at only to the second…), or wanted to have every properties for all museums. If that is the case, you could replace some properties’ links with an OR symbol (for example: ?item ((wdt:P108|wdt:P1416)/(wdt:P279*)) ?museumLinked. ) and the entities in the VALUES with the three museums. This does exactly the same but I joined some UNION together and called the variables under the same name which makes it more simple to read but you lose the kind of affiliation to the museum: https://w.wiki/Ac$o.<br />I am not very good in SPARQL, I hope it still helped you. --[[User:Bischnu|Bischnu]] ([[User talk:Bischnu|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 20:33, 10 July 2024 (UTC)
::Thank you, [[User: Bischnu | Bischnu]]! This is helping me a lot already.
::The issue with the archive and herbarium is that they are part of the same (main) museum that was used for affiliation/employer but I did not know how to include the parts of the organization in the original query (e.g. for 'collection items at').
::As it is now, I can see three sub-graphs, which is also quite nice. I will need to play around a little and try it out.
::Thank you for your help, very much appreciated! [[User:S.v.Mering|S.v.Mering]] ([[User talk:S.v.Mering|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 23:18, 10 July 2024 (UTC)

== Match string in text property values ==

How can we find sub-strings in values of {{P|P2093}}, like "Review By:" seen [https://www.wikidata.org/w/index.php?title=Q89995107&diff=prev&oldid=2201538800 in this fix]? <span class="vcard"><span class="fn">[[User:Pigsonthewing|Andy Mabbett]]</span> (<span class="nickname">Pigsonthewing</span>); [[User talk:Pigsonthewing|Talk to Andy]]; [[Special:Contributions/Pigsonthewing|Andy's edits]]</span> 15:56, 11 July 2024 (UTC)

:There don't seem to be any other example of the Review By problem, so I'll illustrate one approach to the report using the Charles E Griswold string:
:{{SPARQL|query=SELECT DISTINCT ?item ?itemLabel
WHERE {
hint:Query hint:optimizer "None".
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Search";
wikibase:endpoint "www.wikidata.org";
mwapi:srsearch "Charles E Griswold".
?item wikibase:apiOutputItem mwapi:title .
}
?item wdt:P2093 ?string .
filter(regex(".*charles e griswold.*",lcase(?string)))
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
}} --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 18:26, 11 July 2024 (UTC)
:Hello,
:I think that it can be done with the function CONTAINS. You would apply it somehow like this:
:{{SPARQL
| query = SELECT ?item ?ans WHERE {
?item wdt:P31 wd:Q13442814;
wdt:P2093 ?ans.
FILTER(CONTAINS(?ans,"Review by:")).
} LIMIT 10
}}
:It is probably better to restrict on a smaller subset than all scholarly articles as I did there, this request does not succeed and times out. Also, if you are sure that the string you are looking for is always at the beginning, it is also probably better to replace “CONTAINS” by “STRSTARTS”; the latter normally needs less resources to compare the strings. --[[User:Bischnu|Bischnu]] ([[User talk:Bischnu|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 18:37, 11 July 2024 (UTC)

::{{P|2093}} is used ~139M times, mostly on {{Q|13442814}} items. A query predicated on filtering 139M P2093 values is bound to fail given a 60s runtime. The reason that it is appropriate to approach this problem using MWAPI Search is that it can locate the small set of broadly qualifying items, after which a filter can be brought to bear on that small set. --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:32, 11 July 2024 (UTC)
:::OK thanks, great to know! I do not understand how it works. Does the MWAPI only look for the string in all the titles / item labels, after which we search for the author string in these items?<br />Also, I just had a look on what is MWAPI. I saw that you could find a string in all Wikipedia articles for example? How could something that huge not fail? [[User:Bischnu|Bischnu]] ([[User talk:Bischnu|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 19:48, 11 July 2024 (UTC)
::::It's a search engine - specifically CirrusSearch [https://www.mediawiki.org/wiki/Extension:CirrusSearch], based on [[w:en:Elasticsearch]]. Search engines are designed to index large text corpii and return useful results based on search strings. MWAPI is just a route to access Cirrus. I'm not sure offhand which parts of a wikidata item are indexed, but it's much more than labels and descriptions; certainly, P2093 is indexed, else we would not have got results in the above query. There's also an extension to Cirrus customised for wikidata - Extension:WikibaseCirrusSearch [https://www.mediawiki.org/wiki/Help:Extension:WikibaseCirrusSearch] which allows searches to combine text strings with statement values, so <code>mwapi:srsearch "numismatic haswbstatement:P31=Q13442814".</code>, for instance. --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 20:52, 11 July 2024 (UTC)

{{od}}
Thank you, [[User:Tagishsimon|Simon]]. Your example works, but [https://w.wiki/Ae93 a version] searching for the string "Andy" finds zero resuts. <span class="vcard"><span class="fn">[[User:Pigsonthewing|Andy Mabbett]]</span> (<span class="nickname">Pigsonthewing</span>); [[User talk:Pigsonthewing|Talk to Andy]]; [[Special:Contributions/Pigsonthewing|Andy's edits]]</span> 08:31, 12 July 2024 (UTC)
:Yes. Odd. --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 11:47, 12 July 2024 (UTC)

== Pipe symbol in REPLACE ==

How can I specify the pipe symbol '|' in a REPLACE?
REPLACE(?entry,"\|","%7C")
returns
Query is malformed: Lexical error at line 48, column 23. Encountered: "|" (124), after : "\"\\" [[User:Vicarage|Vicarage]] ([[User talk:Vicarage|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 09:36, 12 July 2024 (UTC)

:REPLACE(?entry,"\\|","%7C") --[[User:Tagishsimon|Tagishsimon]] ([[User talk:Tagishsimon|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 11:46, 12 July 2024 (UTC)

== Only wikipedia sitelinks ==

Hi. Would it be possible to change [https://w.wiki/AeEZ this query] to only take into account wikipedia sitelinks and not wikisource, commons, etc. ones? Thanks in advance, [[User:Paucabot|Paucabot]] ([[User talk:Paucabot|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:53, 12 July 2024 (UTC)
:In what way? If you simply want to list entities that have at least one wikipedia sitelink you can do that by adding a semi-join: https://w.wiki/Aena . But if you want the count of sitelinks only to wikipedia that will require a lot more work. I might be too lazy to write the latter. [[User:Infrastruktur|Infrastruktur]] ([[User talk:Infrastruktur|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:35, 13 July 2024 (UTC)
:: I think I'm asking for the latter. [[User:Paucabot|Paucabot]] ([[User talk:Paucabot|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 20:29, 13 July 2024 (UTC)

== Conflation query barely returns any results ==

Not sure why this query is returning so few results. It only returns 29 items, but I expected it to return a lot more. {{Q|17228961}} is one item that has both {{Q|14946528}} and {{Q|63952888}}, so not sure why it isn't part of the results too. Any help would be really appreciated!

:{{SPARQL
| query =
SELECT ?item ?itemLabel (GROUP_CONCAT(?instanceOfLabel; SEPARATOR=", ") AS ?instanceOfValues) WHERE {
?item wdt:P31 wd:Q14946528.
?item wdt:P31 ?type.
VALUES ?type { wd:Q63952888 wd:Q74262765 }
?item wdt:P31 ?instanceOf.
?instanceOf rdfs:label ?instanceOfLabel.
FILTER(LANG(?instanceOfLabel) = "en")
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?item ?itemLabel
}} [[User:Finnius00|Finnius00]] ([[User talk:Finnius00|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 22:36, 12 July 2024 (UTC)

Revision as of 03:05, 14 July 2024

Request a query
Fishing in the Wikidata river requires both an idea where to look for fish and a suitable fishing method. If you have the former, this page can help you find the latter.

This is a page where SPARQL 1.1 Query Language (Q32146616) queries can be requested. Please provide feedback if a query is written for you.

For sample queries, see Examples and Help:Dataset sizing. Property talk pages include also summary queries for these.

For help writing your own queries, or other questions about queries, see Wikidata talk:SPARQL query service/queries and Wikidata:SPARQL query service/query optimization.

Help resources about Wikidata Query Service (Q20950365) and SPARQL: Wikidata:SPARQL query service/Wikidata Query Help and Category:SPARQL.

To report an issue about the Query Service (interface, results views, export...) please see Wikidata:Contact the development team/Query Service and search.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/08.

Speeding up a query of the heritage properties of castles

Do you have any suggestions how to speed this up, which given a list of 500 British castles tries to find out their heritage register properties. If I restrict the registers to UK only, it runs in 25s, but I ideally I want the register to be international, like ‎Burgenwelt ID (P12823)

SELECT DISTINCT ?item ?itemLabel ?type 
  (CONCAT("+",STR(COALESCE(?pointintime,?starttime))) AS ?start)
  ?description ?notes ?link ?title ?keywords WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en-GB,en,fr,de,es,pt,pl,nl,cs". }
  { 
    SELECT DISTINCT ?item ?type ?pointintime ?starttime ?description ?notes ?link ?title ?keywords WHERE {
# places that need tripadvisor entries to be notable
VALUES ?notable {wd:Q23413 wd:Q16748868} # castle, city walls

?item wdt:P31/wdt:P279* ?notable.
?item wdt:P3134 ?tripadvisor.

?item wdt:P17 wd:Q145.

VALUES ?registerclass {
 wd:Q18618628 # cultural heritage
  wd:Q19829908 # places
  wd:Q22964288 # military
  wd:Q23779665 # maritime
  wd:Q74568206 # archives
}
      {
          ?registerclass ^wdt:P31 ?register.

         ?register wikibase:claim ?claim .
          ?register wikibase:statementProperty ?value.

          ?item ?claim ?stat .
          ?stat ?value ?entry .
          OPTIONAL {
          ?register wdt:P2378 ?issuer.
          ?issuer rdfs:label ?issuerlabel. FILTER (LANG(?issuerlabel) = "en")
          ?register wdt:P1630 ?URLprefix.
                  {?stat pq:P1435 ?caveat.} # heritage designation
            UNION {?stat pq:P518  ?caveat}  # applies to part
            UNION {?stat pq:P1810 ?caveat}  # subject named as
            ?caveat rdfs:label ?caveatlabel1. FILTER (LANG(?caveatlabel1) = "en")
          }
          BIND (IF(BOUND(?caveatlabel1),?caveatlabel1,"entry") AS ?caveatlabel2)

          BIND(CONCAT("[[d:Special:EntityPage/",STR(?ISSUERLABEL),"|",STR(?ISSUERLABEL),"]] [[:Template:Website]]") AS ?description)
        }
    }
  }
}
Try it!

Vicarage (talk) 09:13, 2 July 2024 (UTC)[reply]

@Vicarage: This, for instance. A runfirst hint on tripadvisor b/c there are only ~28k uses of that property, so constraining the query to those items makes the rest of it smaller. Forward gearing hint on the P31/P279 property path b/c it's probably more sensible to work from subject to object than the other way around (although in fairness I've not checked: may make little difference).
SELECT DISTINCT ?item ?itemLabel ?type 
  (CONCAT("+",STR(COALESCE(?pointintime,?starttime))) AS ?start)
  ?description ?notes ?link ?title ?keywords WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en-GB,en,fr,de,es,pt,pl,nl,cs". }
  { 
    SELECT DISTINCT ?item ?type ?pointintime ?starttime ?description ?notes ?link ?title ?keywords WHERE {
# places that need tripadvisor entries to be notable
   VALUES ?notable {wd:Q23413 wd:Q16748868} # castle, city walls

  ?item wdt:P3134 ?tripadvisor. hint:Prior hint:runFirst true.
  ?item wdt:P17 wd:Q145.

  ?item wdt:P31/wdt:P279* ?notable. hint:Prior hint:gearing "forward".

VALUES ?registerclass {
 wd:Q18618628 # cultural heritage
  wd:Q19829908 # places
  wd:Q22964288 # military
  wd:Q23779665 # maritime
  wd:Q74568206 # archives
}
      {
          ?registerclass ^wdt:P31 ?register.

         ?register wikibase:claim ?claim .
          ?register wikibase:statementProperty ?value.

          ?item ?claim ?stat .
          ?stat ?value ?entry .
          OPTIONAL {
          ?register wdt:P2378 ?issuer.
          ?issuer rdfs:label ?issuerlabel. FILTER (LANG(?issuerlabel) = "en")
          ?register wdt:P1630 ?URLprefix.
                  {?stat pq:P1435 ?caveat.} # heritage designation
            UNION {?stat pq:P518  ?caveat}  # applies to part
            UNION {?stat pq:P1810 ?caveat}  # subject named as
            ?caveat rdfs:label ?caveatlabel1. FILTER (LANG(?caveatlabel1) = "en")
          }
          BIND (IF(BOUND(?caveatlabel1),?caveatlabel1,"entry") AS ?caveatlabel2)

          BIND(CONCAT("[[d:Special:EntityPage/",STR(?ISSUERLABEL),"|",STR(?ISSUERLABEL),"]] [[:Template:Website]]") AS ?description)
        }
    }
  }
}
Try it!
--Tagishsimon (talk) 11:28, 2 July 2024 (UTC)[reply]

Optimizing this query

Hello! I'm trying to get the population of all the city-municipalities in Baden-Württenberg. The code is fine, but it runs out of time every single time. Can it be optimized somehow?

SELECT DISTINCT ?germany ?germanyLabel ?pop WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?germany wdt:P31 wd:Q42744322.
  ?germany wdt:P1082 ?pop.
  ?germany wdt:P131* wd:Q985.
}
Try it!

Theklan (talk) 20:55, 2 July 2024 (UTC)[reply]

@Theklan: A hint sorts it out. The hint tells the optimiser to work from the subject (?germany) to the object (wd:Q985), rather than from object to subject. The set of values for ?germany is quite small, so quite quick to work subject to object. The set of items that are P131* wd:Q985 is very large, so slow.
SELECT DISTINCT ?germany ?germanyLabel ?pop WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?germany wdt:P31 wd:Q42744322.
  ?germany wdt:P1082 ?pop.
  ?germany wdt:P131* wd:Q985. hint:Prior hint:gearing "forward".
}
Try it!
--Tagishsimon (talk) 22:02, 2 July 2024 (UTC)[reply]
Thanks! Theklan (talk) 07:02, 3 July 2024 (UTC)[reply]

Hey folks. I tried to build my own query for my Pywikibot bot but I can't get it to work to what I want, so I need some help.

I got a query working to get television series episode (Q21191270) that doesn't have Trakt.tv ID (P8013) but the related television series (Q5398426) has one.

SELECT DISTINCT ?episode ?episodeLabel ?episodeTraktId ?series ?seriesLabel ?seriesTraktId
WHERE {  
  
  ?episode p:P31 ?instanceOf.
  ?instanceOf (ps:P31/(wdt:P279*)) wd:Q21191270.
  
  MINUS {
    {
        ?episode p:P8013 ?episodeTraktId.
        ?episodeTraktId (ps:P8013) _:anyValueP8013.
    }
  }
  
  ?episode wdt:P179 ?series.
  ?series wdt:P8013 ?seriesTraktId.
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
} LIMIT 10
Try it!

Now, I want to invert it, so I get television series (Q5398426) that doesn't have Trakt.tv ID (P8013) but the television series episode (Q21191270) within has one. I can't get it to work.

I tried to inverse the filter so it filters the television series (Q5398426) that doesn't have Trakt.tv ID (P8013) instead, and I can't get it to work. I tried to get television series episode (Q21191270) from television series (Q5398426) but I can't figure it out.

In the end, I want to do the following.

  1. Get all television series (Q5398426), television series season (Q3464665), and television series episode (Q21191270) by fetching television series (Q5398426) and its decendants, possibly using has part(s) (P527).
  2. Filter so that whats left is those that has at least one of the three has Trakt.tv ID (P8013) but not all three.
  3. Optional: Only show the television series (Q5398426) for feeding to the Pywikibot bot.
  4. Optional: Check additional entities at once such as Rotten Tomatoes ID (P1258) (for feeding only, so this would be a separate query, while leaving queries for each property as is so I can preview it)

I think that's it for now, and I appreciate the help. Thanks! Hans5958 (talk) 13:42, 7 July 2024 (UTC)[reply]

@Hans5958: Seems like your inverse is this:
SELECT DISTINCT ?episode ?episodeLabel ?episodeTraktId ?series ?seriesLabel ?seriesTraktId
WHERE {  
  
  ?episode p:P31 ?instanceOf.
  ?instanceOf ps:P31/wdt:P279* wd:Q21191270. 
  ?episode p:P8013 ?episodestat .
  ?episodestat ps:P8013 ?episodeTraktId .
  ?episode wdt:P179 ?series.
  MINUS { 
           ?series wdt:P8013 ?seriesTraktId.
        }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
} LIMIT 10
Try it!
And then the one you want - which I understand to be, has one out of the three possible IDs, is in this direction ... you can cut down the elements in the SELECT to those fields you want to feed to pywikibot; you can add other IDs by optional clauses ... OPTIONAL {?edisode wdt:P1258 RT_ID.} .... &c. You can also BIND(COALESCE(ID1,ID2,ID3) as ?ID) if you want to present the singular ID in a singular field to Pywikibot.
It looks to me like the data is a bit dodgy; consider the series https://www.wikidata.org/wiki/Q237072#P8013 which seems to have TraktIds for two series_parts. And the query right now would give a row where the episode and series have no ID, but one of the series_parts (television series season) does have an ID. So, possibly some more work to do. Happy to play more, once you have seen this.
I've also moved it to look at wdt: rather than p: ps: because I expect all the values we're dealing with will be truthy.
Finally, there's more optimisation that could be done by way of selecting rows without labels first, and selecting labels second, so avoid timeouts which sporadically happen.
SELECT DISTINCT ?episode ?episodeLabel ?episodeTraktId ?series ?seriesLabel ?seriesTraktId  ?series_partTraktId
WHERE {  
  
  ?episode wdt:P31/wdt:P279* wd:Q21191270. 
  OPTIONAL {?episode wdt:P8013 ?episodeTraktId . }
  ?episode wdt:P179 ?series.
  OPTIONAL {?series wdt:P8013 ?seriesTraktId. 

  OPTIONAL {?series wdt:P527 ?series_part .
  ?series_part wdt:P8013 ?series_partTrakID . }
  }
   FILTER(!BOUND(?episodeTraktId) && !BOUND(?seriesTraktId)|| 
          !BOUND(?episodeTraktId) && !BOUND(?series_partTraktId) || 
          !BOUND(?seriesTraktId) && !BOUND(?series_partTraktId) && BOUND(?series_part)
         )
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". }
} LIMIT 10
Try it!
--Tagishsimon (talk) 15:48, 7 July 2024 (UTC)[reply]
@Tagishsimon: Thanks for the help. The inverse query seemed to work as I expect, so that's one.
For the second one, I have to be honest that I found it quite difficult to explain it at that time. I think it is best for me to clarify my intentions.
My bot will fill the missing Trakt.tv ID (P8013) of television series season (Q3464665) (series), television series episode (Q21191270) (season), and television series (Q5398426) (episode) items. Since they are related to each other (most likely scenario: a season is part of a series, a episode is part of a season), the best approach in my opinion is to use the values of Trakt.tv ID (P8013) found in either of these three itmes. The bot will iterate each series using has part(s) (P527) to get the seasons and do the same on each seasons to get the episodes, so the bot only gets the list of series items, and nothing else.
I was thinking to do the following.
  1. Get all series and its decendants on two levels (season and episode items) with has part(s) (P527).
  2. Exclude any series when...
    1. ...the series, all of the seasons of the series, and all of the episodes of all of the season has Trakt.tv ID (P8013).
    2. ...the series, all of the seasons of the series, and all of the episodes of all of the season doesn't have Trakt.tv ID (P8013).
    You can say that for all (truthy) values of Trakt.tv ID (P8013)... (no statement = False), XOR all of it, and exclude if it is 0.
    Some examples:
  3. Only show television series (Q5398426) for feeding to the Pywikibot bot (The way I did it is just SELECT ?series as ?item but I said it here in case there are any better ideas)
For the example you gave on Q237072, I checked every seasons of it and I don't see any Trakt.tv ID (P8013) on it. In any case, this should be included by the query since the series has one.
I hope this clears any confusion. Please ask away if there is still one. Sorry for the hold up. Hans5958 (talk) 07:12, 8 July 2024 (UTC)[reply]

Grouping by items

Hello,

I tried to make a graph of people related to some specific organisations (associations or less formal groups with member of (P463) and political parties with member of political party (P102)), which I will call the “first people” or people from the first criteria, as well as the relatives they have in common (with a few different family properties).
The explanations are probably quite confusing to read, there is a shorter FILTERing formula closer to the end of this post which makes the logic more understandable.
As there are hundreds of people in these groups, I want to filter on several criteria. I want people connected to at least two groups or parties, excepted if it is only to political parties and no group, as several people are only changing parties and it is not interesting to include them. I also want some of the relatives of these people; the relatives appearing should be either connected to at least two of the “first people” themselves directly connected to groups (even if these “first people” are only connected to one group or party and therefore excluded from the first criteria), or be connected to one person connected to groups + connected to one group or party themselves.

I first went with a UNION to get people linked to groups and parties, and got different COUNTs on each, which we will call ?cG and ?cP, respectively for the number of groups and the number of political parties the person is connected with.

For the relatives’ part, I went with a subquery reproducing the first UNION, added the relatives, and name the ?relative variable as the ?item variable in order to UNION this section with the first part and counted the connections to the relatives with ?cR. The further FILTER operations with the COUNTs will need to be done outside of the UNIONs, as it will be their different sums which will include or not the people in the final request.

However I failed to properly write the UNIONs. I succeeded getting a result by having the two first COUNTs separated from the relatives’ one, but this way it does not group/merge the items in one, and the people appearing in both subqueries get two lines. This is now my main problem, all the rest can be ignored, I should be able to continue after solving this merging problem.

The current request is:

SELECT DISTINCT ?item ?itemLabel (count (?memberOf) AS ?cG) (count (?memberOfPP) AS ?cP) ?cR WHERE {
  {
    VALUES ?memberOf { a bunch of wd:Q }.
    ?item wdt:P463 ?memberOf.
  }
  UNION
  {
    VALUES ?memberOfPP { another bunch of wd:Q }.
    ?item wdt:P102 ?memberOfPP.
  }
  UNION
  {
    SELECT (?relative AS ?item) (count (?relative) AS ?cR) WHERE {
      SELECT DISTINCT ?item ?relative WHERE {
        {
          VALUES ?memberOf { a bunch of wd:Q again }.
          ?item wdt:P463 ?memberOf.
        }
        UNION
        {
          VALUES ?memberOfPP { another bunch of wd:Q again }.
          ?item wdt:P102 ?memberOfPP.
        }
        ?relative ^(wdt:P22|wdt:P25|wdt:P26|wdt:P40|wdt:P451|wdt:P1038|wdt:P3373|wdt:P3448) ?item.
      }
    } GROUP BY ?relative
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en". }
} GROUP BY ?item ?itemLabel ?cR
Try it!

I have not written the FILTERing conditions in order to better visualise the results. It should look like the following conditions, but it will be quite trivial to write so it is of no importance for the moment:
(?cG > 1) OR (?cR > 1) OR ((?cG + ?cR) > 1) OR (?cG > 0 AND (?cG + ?cP) > 1) OR (?cR > 0 AND (?cR + ?cP) > 1)

Also in the end, it will be a graph between people, the groups and political parties they are belonging to and some other people connecting groups with others. This will also be done once the conditions can be applied, by joining the result with simple OPTIONAL queries or another way. This will not be complicated to write, so I do not need help on this either.

If this query can be made, I will then add a last criteria/UNION consisting of adding the people excluded from the first criteria, but connecting to people included in the other main query. For example someone who is member of only one targeted group, but who has a relative themself only included in the main query as being a relative of 2+ people from the first UNION. In other words, someone appearing in one of the two first groups/parties’ UNIONs without fulfilling the whole criteria (so normally not in the relatives’ UNION), but who is the cause of people in the relatives’ UNION to appear.

Thanks in advance! --Bischnu (talk) 23:33, 7 July 2024 (UTC)[reply]

After letting the query aside for a few days, I succeeded in getting what I wanted. The correct query is:
SELECT DISTINCT ?item ?itemLabel (count (?memberOf) AS ?cG) (count (?memberOfPP) AS ?cP) (count (?source) AS ?cR) WHERE {
  {
    VALUES ?memberOf { a bunch of wd:Q }.
    ?item wdt:P463 ?memberOf.
  }
  UNION
  {
    VALUES ?memberOfPP { another bunch of wd:Q }.
    ?item wdt:P102 ?memberOfPP.
  }
  UNION
  {
    SELECT DISTINCT ?source ?item WHERE {
      {
        VALUES ?memberOf { a bunch of wd:Q again }.
        ?source wdt:P463 ?memberOf.
      }
      UNION
      {
        VALUES ?memberOfPP { another bunch of wd:Q again }.
        ?source wdt:P102 ?memberOfPP.
      }
      ?item ^(wdt:P22|wdt:P25|wdt:P26|wdt:P40|wdt:P451|wdt:P1038|wdt:P3373|wdt:P3448) ?source.
    }
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en". }
} GROUP BY ?item ?itemLabel ?cR
Try it!

Agents linked to an instition (collection)

Hi, I am trying to query and then visualize the people, i.e. staff members, collection agents and more connected to my institution which houses a large collection (http://www.wikidata.org/entity/Q233098).

The people are connected via: P108: employer P1416: affiliation P485: archives at P11146: collection items at

I have created this easy query but then the people are not linked... https://w.wiki/AcpG Since the Archive and Herbarium are part of the main institution, all people are connected to Q233098. Some are connected with each other (via the property 'significant person' as co-authors or co-collectors, some in academic relations (doctoral adivisor, doctoral student) but not so many yet. More work needs to be done. Occupations could maybe be useful to group the agents.

Any suggestions how best to visualize this in a graph of people connected to the museum? Thank you! S.v.Mering (talk) 13:04, 10 July 2024 (UTC)[reply]

Hello,
From what I know, in order to visualise the museums and links to the museums, you have to add the column with the museum in the SELECT, so each row would be a combination of a person and the museum it is linked to.
Here is an example based on your query: https://w.wiki/Aczp, which is not very elegant as I let it in different UNIONs and different variables. I did not know if you wanted to keep the exact result you already had (so the employer and affiliation properties only applies to the first museum, the archives at only to the second…), or wanted to have every properties for all museums. If that is the case, you could replace some properties’ links with an OR symbol (for example: ?item ((wdt:P108|wdt:P1416)/(wdt:P279*)) ?museumLinked. ) and the entities in the VALUES with the three museums. This does exactly the same but I joined some UNION together and called the variables under the same name which makes it more simple to read but you lose the kind of affiliation to the museum: https://w.wiki/Ac$o.
I am not very good in SPARQL, I hope it still helped you. --Bischnu (talk) 20:33, 10 July 2024 (UTC)[reply]
Thank you, Bischnu! This is helping me a lot already.
The issue with the archive and herbarium is that they are part of the same (main) museum that was used for affiliation/employer but I did not know how to include the parts of the organization in the original query (e.g. for 'collection items at').
As it is now, I can see three sub-graphs, which is also quite nice. I will need to play around a little and try it out.
Thank you for your help, very much appreciated! S.v.Mering (talk) 23:18, 10 July 2024 (UTC)[reply]

Match string in text property values

How can we find sub-strings in values of author name string (P2093), like "Review By:" seen in this fix? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:56, 11 July 2024 (UTC)[reply]

There don't seem to be any other example of the Review By problem, so I'll illustrate one approach to the report using the Charles E Griswold string:
SELECT DISTINCT ?item ?itemLabel 
WHERE {
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Search";
                    wikibase:endpoint "www.wikidata.org";
                    mwapi:srsearch "Charles E Griswold".
    ?item wikibase:apiOutputItem mwapi:title .
  }
  ?item wdt:P2093 ?string . 
  filter(regex(".*charles e griswold.*",lcase(?string)))
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Tagishsimon (talk) 18:26, 11 July 2024 (UTC)[reply]
Hello,
I think that it can be done with the function CONTAINS. You would apply it somehow like this:
SELECT ?item ?ans WHERE {  
?item wdt:P31 wd:Q13442814;  
wdt:P2093 ?ans.  
FILTER(CONTAINS(?ans,"Review by:")).  
} LIMIT 10
Try it!
It is probably better to restrict on a smaller subset than all scholarly articles as I did there, this request does not succeed and times out. Also, if you are sure that the string you are looking for is always at the beginning, it is also probably better to replace “CONTAINS” by “STRSTARTS”; the latter normally needs less resources to compare the strings. --Bischnu (talk) 18:37, 11 July 2024 (UTC)[reply]
author name string (P2093) is used ~139M times, mostly on scholarly article (Q13442814) items. A query predicated on filtering 139M P2093 values is bound to fail given a 60s runtime. The reason that it is appropriate to approach this problem using MWAPI Search is that it can locate the small set of broadly qualifying items, after which a filter can be brought to bear on that small set. --Tagishsimon (talk) 19:32, 11 July 2024 (UTC)[reply]
OK thanks, great to know! I do not understand how it works. Does the MWAPI only look for the string in all the titles / item labels, after which we search for the author string in these items?
Also, I just had a look on what is MWAPI. I saw that you could find a string in all Wikipedia articles for example? How could something that huge not fail? Bischnu (talk) 19:48, 11 July 2024 (UTC)[reply]
It's a search engine - specifically CirrusSearch [1], based on w:en:Elasticsearch. Search engines are designed to index large text corpii and return useful results based on search strings. MWAPI is just a route to access Cirrus. I'm not sure offhand which parts of a wikidata item are indexed, but it's much more than labels and descriptions; certainly, P2093 is indexed, else we would not have got results in the above query. There's also an extension to Cirrus customised for wikidata - Extension:WikibaseCirrusSearch [2] which allows searches to combine text strings with statement values, so mwapi:srsearch "numismatic haswbstatement:P31=Q13442814"., for instance. --Tagishsimon (talk) 20:52, 11 July 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Thank you, Simon. Your example works, but a version searching for the string "Andy" finds zero resuts. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 08:31, 12 July 2024 (UTC)[reply]

Yes. Odd. --Tagishsimon (talk) 11:47, 12 July 2024 (UTC)[reply]

Pipe symbol in REPLACE

How can I specify the pipe symbol '|' in a REPLACE?

REPLACE(?entry,"\|","%7C")

returns

Query is malformed: Lexical error at line 48, column 23. Encountered: "|" (124), after : "\"\\" Vicarage (talk) 09:36, 12 July 2024 (UTC)[reply]
REPLACE(?entry,"\\|","%7C") --Tagishsimon (talk) 11:46, 12 July 2024 (UTC)[reply]

Hi. Would it be possible to change this query to only take into account wikipedia sitelinks and not wikisource, commons, etc. ones? Thanks in advance, Paucabot (talk) 12:53, 12 July 2024 (UTC)[reply]

In what way? If you simply want to list entities that have at least one wikipedia sitelink you can do that by adding a semi-join: https://w.wiki/Aena . But if you want the count of sitelinks only to wikipedia that will require a lot more work. I might be too lazy to write the latter. Infrastruktur (talk) 12:35, 13 July 2024 (UTC)[reply]
I think I'm asking for the latter. Paucabot (talk) 20:29, 13 July 2024 (UTC)[reply]

Conflation query barely returns any results

Not sure why this query is returning so few results. It only returns 29 items, but I expected it to return a lot more. Twin Star Exorcists (Q17228961) is one item that has both conflation (Q14946528) and anime television series (Q63952888), so not sure why it isn't part of the results too. Any help would be really appreciated!

SELECT ?item ?itemLabel (GROUP_CONCAT(?instanceOfLabel; SEPARATOR=", ") AS ?instanceOfValues) WHERE {
    ?item wdt:P31 wd:Q14946528.
    ?item wdt:P31 ?type.
    VALUES ?type { wd:Q63952888 wd:Q74262765 }
    ?item wdt:P31 ?instanceOf.
    ?instanceOf rdfs:label ?instanceOfLabel.
    FILTER(LANG(?instanceOfLabel) = "en")
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  }
  GROUP BY ?item ?itemLabel
Try it!
Finnius00 (talk) 22:36, 12 July 2024 (UTC)[reply]