@paregorios @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @ekansa @sebhth
Hm. I feel like Ancient risks being confusing when applied to my stuff, as it's often used with the more specific meaning of "pre-medieval" by medievalists like myself?
Yeah, that there is a conventional disciplinary boundary that IMO gets in the way of so many interesting things. But at the same time, transgressing the terminology of that boundary can sow confusion and misguided reaction that is equally obstructive.
:|
@sebhth @ekansa @Electricarchaeo @ryanfb @steko @jenniferlouise @mlemweb @seanmunger @captain_primate
@paregorios @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth @JubalBarca
Yep. A hash tag tends to work if people can recognize its meaning without much effort. That tends to entrench concepts that maybe should be disrupted / questioned.
I don't have an easy answer, but I'm in favor of following some updates about "humanistic scholarship about older things".
@ekansa @paregorios @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
Maybe ditch the "Today" since it'll just be whenever we're posting it, and use #BeforeModernTimes or something? Though I guess that may imply different things to non-historians who don't think of "modern" as "post sixteenth century"...
@JubalBarca @paregorios @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
#BeforeModernTimes works for me, and I'm OK with fuzzy and differences in opinions about what that actually means.
@paregorios @JubalBarca @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
If we want to get pedantic, we can use http://perio.do to reference more precise period definitions, but URIs don't work the same as hashtags in social media...
@paregorios @JubalBarca @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
Just a quick update, because I'm looking at server logs now...
It seems that the Mozilla DotBot has suddenly taken a huge interest in data documenting #BeforeModernTimes #archaeology material culture. It is doing an intense crawl of Open Context.
@paregorios @JubalBarca @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
I just finished indexing data documenting about 4500 #archaeology sites in California that link to the Phoebe A Hearst Museum of Anthropology collections.
@ekansa @paregorios @JubalBarca @captain_primate @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
Is #historodons still a thing? Honestly I don't do a whole lot of ancient or even medieval stuff anymore; my podcast is focused on the 19th century, so I won't be using the earlier period hashtags that often.
I haven't seen much traffic on that hashtag (other than from you and @JubalBarca), but that may just be a consequence of my anemic follow network so far. In any case, I think that it would be good to keep using that hashtag when appropriate (and I see the JB has done so).
@sebhth @Electricarchaeo @ryanfb @steko @jenniferlouise @mlemweb @captain_primate @JubalBarca @ekansa
@JubalBarca @paregorios @captain_primate @seanmunger @mlemweb @jenniferlouise @steko @ryanfb @Electricarchaeo @sebhth
#BeforeModernTimes
OK. I'm wanting to update how we archive data with Open Context.
I'm seriously looking at Zenodo. The question is, is it still worthwhile to put 1.5 million OC GeoJSON-LD files in Github? GitHub is a pain at that kind of scale, but if people think it essential or useful, I want to know.
Otherwise, I'll just use build off the Zenodo API.
Thoughts?
Who uses your data and do they like getting it from GitHub?
Because GitHub is a commercial single point of failure. And if it's more trouble than it's worth for you, and if nobody's clamoring for it, why not go straight to the archive?
That's my impression also. I think it will be more trouble than it is worth to go into GitHub. I just wanted to check to see if anyone had a compelling reason to also use it to version control structured data.
@ekansa Well, I like the notion of being able to do that with the #PleiadesGazetteer #JSON (and that's why I keep that JSON formatted and key-sorted), but *practically* I'm not sure what I'm getting out of it.
Yep, our JSON also has predictable key sorting. It's also sorta fun to see the GeoJSON rendered in GitHub. But Zenodo does the versioning thing.
Next question. 10's of thousands of GeoJSON-LD files in an archival "deposit". Should I just compress many files into giant tarballs or is there value in having each one individually identified / accessible in the repository?
My guess has been that a single or small number of giant, compressed blobs is preferable so that interested parties don't have to do lots of repetitious interaction with the archive server. Thus, in both Zenodo and the NYU FDA, Pleiades data is a single zip file:
https://doi.org/10.5281/zenodo.1193921
http://hdl.handle.net/2451/41737
This is based solely on personal annoyance with getting data from other places for other things.
Yep. OK. This all makes sense. I'll get cracking on this! I will make a different zip archive (probably zip, because easier for non-Linux folks) for each DOI identified dataset in Open Context. Some will have 10's of thousands of JSON files and images files etc, some will be just a CSV (for table dumps).
Sound workable?
@ekansa Yeah, that fits my brain, fwiw
@ekansa @paregorios this all sounds like what I would do myself: 1) move away from GitHub - single point of failure and not very good for "big data" 2) leverage the Zenodo API with versioning 3) one dataset = one archive file (with a descriptor? e.g. a datapackage.json or similar metadata)
Stefano -> YES! Thanks, I'm now in active development for archiving with Zenodo. The main issue is always granularity for us, and bundling up a bunch of JSON files into one submission is very attractive. For the most part that will work, but in some cases, there are some complex licensing issues. Some datasets need to have a variety of licenses for images, so I have to break them apart into different archive bundles for Zenodo.
Nothing is ever simple.
@ekansa @paregorios ah, licensing issues, the minefield of open data. Still, I think #Zenodo can handle relationships between datasets with a variety of predicates, that could work well to keep different pieces of the same bigger archive together (untested)
Here's a test upload (in the sandbox) of the files associated with a small project.
Does this look useful?
https://sandbox.zenodo.org/record/217212
Tips for improvement? All the metadata is generated from metadata we already had, so I'm pleased by the ability script uploads and documentation.
@steko @paregorios putting @aejolene in the loop also.
One more thing about our use of Zenodo. I'm making separate archives for all the pictures associated with an Open Context project and the structured data in an Open Context project. The reasons is that we'll too often run into the Zenodo limits on storage for an individual archive. Because of this, some Open Context will have many different Zenodo archives.
@ekansa @steko @paregorios Thanks! I've been lurking...
I've got nothing to add but I'm learning from the thread.
Sorry tooted in haste last night. "A Quite Place" was about to begin (BTW: fun movie!!!)
My tooth last night was a messy way of saying that I have to break up Open Context projects into separate Zenodo archives. Not ideal, but metadata should are their common relationships explicit.
Second example, this one has Pleiades and PeriodO URIs in the metadata:
https://sandbox.zenodo.org/record/217236
I think I'm more or less OK to advance to using the real API not the sandbox?
Do you think it is good to go for production use?
OK. I'm now uploading our image data into Zenodo. At the rate things are going, this will take several days running round the clock.
I can't believe this is really a free service, especially when I consider costs of other digital repository services.
So what's the downside to using Zenodo?
@ekansa @paregorios @aejolene I think for your use case the downside is that #Zenodo is a generic service with nothing specific to the OC disciplinary realm (e.g. compared to ADS or TDAR). Other than that, the sustainability model is rather sound, but again very different from the depositer-pays approach.
I'm hoping that OC mitigates the downsides by providing curation services as it publishes data. We are too small to do longevity / preservation services.
@ekansa Can Zenodo handle access restrictions for sensitive data?
@paregorios @steko
@aejolene @ekansa @paregorios yes, it offers a per-package embargo option with a user-defined expiration date (investigated this last year for a NAGPRA problematic submission to JOAD)
@steko @ekansa @paregorios This could be really something, especially since all data I'm responsible for connect to OC/DINAA recs and we have zero preservation infrastructure (having a crisis right now, actually).
@JubalBarca @sebhth @ekansa @Electricarchaeo @ryanfb @steko @jenniferlouise @mlemweb @seanmunger @captain_primate
So maybe also use #MedievalToday ? I feel like we could all benefit from clubbing together ...