Wednesday February 08 , 2012
Text Size
   
Welcome, Guest
Username Password: Remember me

Profile for Cory Banack (cbanack)

  • OFFLINE
  • Rank: Platinum Boarder
  • Register Date: 02 Jan 2010
  • Last Visit Date: 04 Feb 2012
  • Time Zone: GMT +0:00
  • Local Time: 01:06
  • Posts: 564
  • Profile Views: 439
  • Karma: 45
  • Location: Canada
  • Gender: Male
  • Birthdate: Unknown

Signature

Comic Vine Scraper: latest version here, bugs and suggestions here.
Posts

Posts

emo
That's very interesting! It would appear that your users skew towards high end android devices, if you compare them with the distribution of Android versions in general.
Full Edition usage i ...
Category: General
emo
Ahh yes, the problem there is definitely the filenames. I assume the 200 in there refers to issue number 200. So the others are named with 199 or 201?

This is not an uncommon problem....a couple solutions:

1. Rename the files to something simpler, like: 'Action Comics 200'. If you're a pretty computer savvy person this isn't too hard because there are free programs out there for bulk renaming large groups of files. You can even rename them back the old way if you like once you're done scraping.

2. Wait for a few weeks for the next version of the scraper to come out. It will have some new features that will make handling your situation (which is not uncommon) quite a bit easier.
Strange issue with C ...
Category: Scripts
emo
What do the filenames for these comics look like?

Have you turned off rescraping in the preferences, or have you turned off scraping to the tags or notes field? Both of those things can cause a previously scraped comic book to search again after the first scrape... Otherwise, it should remember which series and issue you chose last time you scraped that file.
Strange issue with C ...
Category: Scripts
emo
What do the file names for these comics look like?

Comic Vine Scraper bases the initial search for a comic book on the name of the file being searched; usually strange search terms are a result of strange file names.

However, once you have scraped the comic and specified the correct series once (searching for the series name manually if you have to) all future scrapes of that comic should work correctly without you needing to specify anything after that (regardless of whether the comic has a "weird" file name.)
Strange issue with C ...
Category: Scripts
emo
forkicks, I am not sure whether to love you or hate you.

Nah, just kidding. These examples you've been digging up are going to help a lot. Thanks!
Story Arc vs Alterna ...
Category: General
emo
Blarrrrrgh.

Still, seeing examples like that is still tremendously helpful. If you find any more, please post them! The biggest danger here is that I don't collect nearly as many comics as some of you guys do, so I'm not familiar with the many different kinds of story arc titles.
Story Arc vs Alterna ...
Category: General
emo
@forkicks: Yikes, really? Those are for Story Arcs, not crossovers?

Issues have titles like:

A Great Story: Introduction
A Great Story: Conclusion

?

If that's true, you're right, it IS a very difficult problem to solve. I was kind of counting on there being some numbers involved, at least, otherwise its pretty hard to tell the difference between a fancy title, and one that actually contains a story arc in it.

I guess the most common format is still something like:

A Great Story, Part 3
Story Arc vs Alterna ...
Category: General
emo
If they had hyperlinks to each review available from their API, I'd consider trying to do scrape those in somehow--I'm not really interested in trying to copy the entire html text of the reviews in, though.

It doesn't matter either way, though, because right now ComicVine doesn't actually provide any way to access any Reviews from the webs API.
Comic Vine Scraper 1 ...
emo
600WPMPO wrote:

Alternate series = ComicVine's Story Arc = Regenesis
Story Arc = Goodbye Chinatown

Ok, we can do it that way.

I'd also humbly suggest to cYo that the field that is currently called "Alternate Series or Storyline Title" would be better named "Alternate Series or Crossover Title". A "Storyline Title" is now more properly stored in the "Story Arc" field, right?

If we look closely, mostly the story arc name is hidden in the title. e.g. Title: Goodbye Chinatown, Part One. Can the scraper be 'tricked' into pulling this out from the title?

I can give it a try, but it's difficult to get it perfectly right because, as Yellowbox says:

The title numbering is horribly inconsistent on ComicVine. "Part", "pt", "issue", "#", "no.", commas and no commas, colons and no colons, "1" or "One" or "One of Seven".

That makes it tricky to write a program to pull the right part out every time. It would help if I could come up with a complete list of how story arcs names are embedded in comic book titles. Based on what Yellowbox said, here's what I have so far (for a story arc called "A Great Story"):

A Great Story Part 1
A Great Story, Part 1
A Great Story: Part 1
A Great Story - Part 1

Plus all of those with these different variations on the word "Part":

"pt", "pt.", "#", "number", "num", "num.", "no."

Plus all of those with the following different variations on numbers:

"One", "1", "One of Five", "One of 5", "1 of 5", "1 (of 5)", "1 (of Five")

Can anyone think of any other variations?
Story Arc vs Alterna ...
Category: General
emo
It seems to me like we may now have one extra field that we don't really need...?

Series Name: Wolverine
Title: Goodbye Chinatown, Part One
Story Arc: Regenesis
Alternate Series Name: ???

As far as I'm aware, Comic Vine only contains at most three names for any given issue: the name of the series, the title of the issue, and the name of the story arc. Maybe it would have been better if the "Alternate Series Name" had been renamed to "Story Arc", instead of being a whole new metadata field?
Story Arc vs Alterna ...
Category: General
emo
oraclexview wrote:
I have to agree. cbanack, your work is beyond astounding. In fact, your great work makes you, in my opinion 2nd in command of the Astonishing & Uncanny CR-Men! Right behind cYo of course...cYo is the great Professor C after all! lol

Hopefully this means I'll get some Uncanny Superpowers!

But I could never compete with Professor C--his superpower must be the ability to code new features with his mind while he's asleep. Otherwise, I don't know how one person can write so much software so fast...
ComicVine Scraper - ...
Category: Scripts
emo
Hmmm I thought I saw you sneaking around the google code page...

As for my plans, the next release is going to focus on making use of the improved ComicVine API, so many of the changes will be "under the hood" -- the Comic Vine connection code will get a lot cleaner and easier to read, but the main thing most people will notice is that it runs faster. Especially the first time you scrape a series--no more "loading issues" dialog.

The improved API also gives me fast access to the title of each issue (when Comic Vine has one), so I've already changed the "Show Issues" dialog to display those titles, instead of the series name repeated over and over (which was kind of useless anyway):



I've got a few bugs to fix, of course, and obviously the story arcs should get scraped into the new Story Arc metadata field that cYo added (right now they get scraped into the Alternate Series metadata field, but that's not really the right place anymore.)

Hmmm...what else? I've already changed the scraper so that it extracts the series names and issue numbers directly from the comic's filename when scraping a comic for the first time (instead of relying on ComicRack's "shadow values" to give it that information.) Most people won't notice much of a difference, it just gives me more flexibility when it comes to dealing with weirdly named series like "2000AD."

Also, I'd like to try to make it so that the series search results tend to "prefer" series that you've chosen in the past--so if there are 5 series called "Batman", the one you picked last time will be more likely to appear at the top of the list next time. It's sort of like your choices will "train" the scraper to make better guesses about which series you are collecting and scraping each week.

Oh yeah, and I'm going to add this "cvinfo" file feature. That one will be tremendously useful for people (like me!) who sort their comics into directories based on series.

I've got lots of other small (and bigger) ideas, but they'll have to wait. I'm not even sure I'll get to all of this stuff for the next release, but I'll try. I just don't want to push the release back too far, since it's already been a long time since we've had an update, and I know a lot of people are waiting for certain features, like the Arc metadata thing.
ComicVine Scraper - ...
Category: Scripts
emo
The next version of the scraper should be out in within a month, probably less if I stay productive.
Tips & Tricks Manual ...
emo
So what percentage of those have you read?!?
Personal Milestone: ...
Category: General
emo
cYo wrote:
Don't forget remembering the last series one has selected


Yeah, I was thinking about making it so that all the series you've chosen in the past tend to float to the top of the results list when you search.

And a new one: some matcher to filter out Publishers. ComicVine seems to go global and there are some local publisher are now in the list with the same series.


Interesting...can you (or someone) give me an example a series that appears twice with different publishers?
ComicVine Scraper - ...
Category: Scripts
emo
In fact, this IS something in my medium-to-long range plans. But it will require an enormous amount of changes to the existing scraper. Right now, as you say, the scraper plugin doesn't directly write to the comic files, ComicRack does. And it's not really safe to change that--when ComicRack is running, other programs shouldn't mess with the comic files directly. That means I will probably need to create a separate, standalone version of the scraper in order to do what you're suggesting.

Still, I've been slowly chipping away at the changes need to do that kind of thing, but I'm not going to be finished soon (i.e. not in the next release of the scraper.) I just noticed today that ComicVine seems to have improved their API format, which should allow me to greatly speed up how fast the scraper runs...so that's going to be my project for the next little while and for the next release of the scraper.
ComicVine Scraper - ...
Category: Scripts
emo
I quite like this feature suggestion too, and it will definitely be in the next release of Comic Vine Scraper.

Things have been very busy in my life lately, though--I'm lucky if I get 3 or 4 hours a week to work on the next release, so it's taking a while. But I AM still working on it, and you will see this (and a few other) features sooner or later.
ComicVine Scraper - ...
Category: Scripts
emo
That does seem rather large, but...

Have you scraped all of them, thus adding cover images to them? I can't remember off the top of my head how large those cover images are, but for the sake of example, let's assume they are quite large: 1 MB each. Then:

1 MB per cover image x 24000 fileless comics = approximately 24 GB

So yeah, I guess 12 gigs could be reasonable for your database backup, if you have cover images.
ComicRack backup que ...
Category: Help
emo
Great job (as usual), I love it!
Tips & Tricks Manual ...
emo
One way to implement this feature is to make the scraper check for a special file called "cvinfo". The contents of this file would just be a series ID number from Comic Vine.

So anytime the scraper is scraping a comic, it would look for a "cvinfo" file in that comic's directory, and if it finds one, it will automatically use the file's contents to define the series for that comic (rather than asking you to specify it manually.)

This is exactly how the XBMC media player's TVDB.com scraper works...it's nice way to implement this feature, because novice users (or those who don't want "strict series" directories) will never accidentally get into trouble with it.
ComicVine Scraper - ...
Category: Scripts
More
Time to create page: 0.60 seconds

Who's Online

We have 132 guests and 2 members online
  • KimagureOtaku
  • cYo

PIM

You are not logged in.